Only questions:
Q46. How to find out who is using your generic graph?
Q47. What is better join with database or unload and join in abinitio. Explain why?
Q48. Where you maintain external servers/applications connectivity information?
Q49. What all the things you will take into consideration when you introduce a phase break. (whats the reason lead you to use phase break?)
Q50. What type of parallelism you will break by using phase breaks in graph?
Q46. How to find out who is using your generic graph?
Q47. What is better join with database or unload and join in abinitio. Explain why?
Q48. Where you maintain external servers/applications connectivity information?
Q49. What all the things you will take into consideration when you introduce a phase break. (whats the reason lead you to use phase break?)
Q50. What type of parallelism you will break by using phase breaks in graph?
Scenario Questions:
Q51. For any duplicate record entries only single should be present in output.
E.g.
Source_city Destination_city kms
Mumbai Pune 200
Pune Hyderabad 550
Hyderabad Pune 550
Pune Mumbai. 200
Output should be:
Mumbai Pune 200
Pune Hyderabad 550
Q52. How to pass the records to output only if input id is prime number.
Q53. We have 4 files. We have to unload the data from those 4 files. If for any file 1 record is rejected then that file need to reject? How you can achieve this?
Q54. In file we have data like below:
1,murali,3,4,5
In the output file We need the data like this
1,murali,3
1,murali,4
1,murali,5
How do you achieve this?
Q55. In an org, we have 2 files say file1, file2.
File1 have employee details who are currently working in that org & who are left from that org.
File2 have working employees.
We need to send the data In output file file1 working employees & in output file file2 we need to send the data who are left from the org.
How can u achieve this?
Q56. We have file file1. In that we have 10 records. How can we send all these records to reject port by using Filter By Expression component?
Q57. I have a graph like the below (for example)
input file --> join ---->replicate ---->sort --->output
in the above, the layout of input file and join are set to multifile where as sort and output file are set to serial layout.
So, the above graph ran file. My question is, does replicate behaves as a gathers in this case, if yes please explain why? Because I was expecting a depth error here.
You can write your answers to us on AbiInterviewQuestions@gmail.com
Q51. For any duplicate record entries only single should be present in output.
E.g.
Source_city Destination_city kms
Mumbai Pune 200
Pune Hyderabad 550
Hyderabad Pune 550
Pune Mumbai. 200
Output should be:
Mumbai Pune 200
Pune Hyderabad 550
Q52. How to pass the records to output only if input id is prime number.
Q53. We have 4 files. We have to unload the data from those 4 files. If for any file 1 record is rejected then that file need to reject? How you can achieve this?
Q54. In file we have data like below:
1,murali,3,4,5
In the output file We need the data like this
1,murali,3
1,murali,4
1,murali,5
How do you achieve this?
Q55. In an org, we have 2 files say file1, file2.
File1 have employee details who are currently working in that org & who are left from that org.
File2 have working employees.
We need to send the data In output file file1 working employees & in output file file2 we need to send the data who are left from the org.
How can u achieve this?
Q56. We have file file1. In that we have 10 records. How can we send all these records to reject port by using Filter By Expression component?
Q57. I have a graph like the below (for example)
input file --> join ---->replicate ---->sort --->output
in the above, the layout of input file and join are set to multifile where as sort and output file are set to serial layout.
So, the above graph ran file. My question is, does replicate behaves as a gathers in this case, if yes please explain why? Because I was expecting a depth error here.
You can write your answers to us on AbiInterviewQuestions@gmail.com
Please share this link to people who are preparing for the abi interview:
https://t.me/abi_interview_qstn
For any queries and new questions submission email us at
AbiInterviewQuestions@gmail.com
We will try add new questions daily, please do keep visiting this group.
https://t.me/abi_interview_qstn
For any queries and new questions submission email us at
AbiInterviewQuestions@gmail.com
We will try add new questions daily, please do keep visiting this group.
Telegram
Abinitio Interview Questions
This group is designed for posting Abinitio tool related interview questions.
Scenario questions:
Q58. I have an ebcdic file from mainframe system. Its dml is something like
record
integer(4) id,
void(120) bill_dtl
end
At position 80-85 and 100-104 has some information (it is actauly amount and product in the bill) which I need to extract, how to do it.
Q59. Suppose Sachin and Sehwag playing the match. Sachin playing the first ball Then how to calculate total runs scored by Sachin in single over? Consider extra balls like wide and no-ball as well along with 1,3,5 runs.
Q60. I have customer details as below,
cust_card_no item amount
10001 pen 10
10001 copy 20
10001 pen 10
10001 copy 20
10001 copy 20
now my question is to find the distinct count of items per customer and along with total amount spent by a customer?
Q61. Suppose I have a scenario like below:
Input:
======
Id Col1 Col2
a 100 Null
a Null 200
b 300 Null
b Null 400
And I need output as:
Output:
=======
Id Col1 Col2
a 100 200
b 300 400
How to get only Not NUll Values?
Q62. How to achieve the below scenario in abinitio?
Input file has below contents: Ball Run
1 1
2 1
3 1
4 1
5 1
6 1
1 0 ,2 1 ,3 1 ,4 1 ,5 1 ,6 1
1 1 ,2 1 ,3 1 ,4 1 ,5 1 ,6 0
Required Output :
Over Runs
1 6
2 5
3 5
Q63. I have a record like below: avi,mahi,virat,mahi,avi,virat,mahi,virat,avi
Output should be: avi1 mahi1 virat1 mahi2 avi2 virat3 mahi3 virat4 avi3
Please suggest how can I achieve this.
You can write your answers to us on AbiInterviewQuestions@gmail.com
Q58. I have an ebcdic file from mainframe system. Its dml is something like
record
integer(4) id,
void(120) bill_dtl
end
At position 80-85 and 100-104 has some information (it is actauly amount and product in the bill) which I need to extract, how to do it.
Q59. Suppose Sachin and Sehwag playing the match. Sachin playing the first ball Then how to calculate total runs scored by Sachin in single over? Consider extra balls like wide and no-ball as well along with 1,3,5 runs.
Q60. I have customer details as below,
cust_card_no item amount
10001 pen 10
10001 copy 20
10001 pen 10
10001 copy 20
10001 copy 20
now my question is to find the distinct count of items per customer and along with total amount spent by a customer?
Q61. Suppose I have a scenario like below:
Input:
======
Id Col1 Col2
a 100 Null
a Null 200
b 300 Null
b Null 400
And I need output as:
Output:
=======
Id Col1 Col2
a 100 200
b 300 400
How to get only Not NUll Values?
Q62. How to achieve the below scenario in abinitio?
Input file has below contents: Ball Run
1 1
2 1
3 1
4 1
5 1
6 1
1 0 ,2 1 ,3 1 ,4 1 ,5 1 ,6 1
1 1 ,2 1 ,3 1 ,4 1 ,5 1 ,6 0
Required Output :
Over Runs
1 6
2 5
3 5
Q63. I have a record like below: avi,mahi,virat,mahi,avi,virat,mahi,virat,avi
Output should be: avi1 mahi1 virat1 mahi2 avi2 virat3 mahi3 virat4 avi3
Please suggest how can I achieve this.
You can write your answers to us on AbiInterviewQuestions@gmail.com
Tip 11:
A component's layout specifies where the component runs (on what computer & in what directory) and the number of ways parallel the component runs.
The layout default lacks a directory path, so when this layout is used, the component runs in the directory specified by AB_WORK_DIR.
During execution, components might need to write temporary data to disk. In most cases these temporary files r maintained in .WORK subdirectory of a component's layout.
For database components with a Database:default layout, the directory specified by AB_DATA_DIR provides disk storage for temporary files.
A component's layout specifies where the component runs (on what computer & in what directory) and the number of ways parallel the component runs.
The layout default lacks a directory path, so when this layout is used, the component runs in the directory specified by AB_WORK_DIR.
During execution, components might need to write temporary data to disk. In most cases these temporary files r maintained in .WORK subdirectory of a component's layout.
For database components with a Database:default layout, the directory specified by AB_DATA_DIR provides disk storage for temporary files.
Questions and answers:
Q64. What is a broken lock ? How to fix it?
Ans:
If u want to edit a file, u must obtain a lock on that file. This prevents multiple users from modifying the same file at same time. However, if another user wants to edit ur locked file, he/she must 1st break ur lock. The act of breaking lock creates a broken lock.
A broken lock is effectively a msg to the original lock owner that another user has broken the lock.
When u open the file with broken lock, the GDE displays a msg informing u that ur lock has been broken. At that point, u may need to reconcile your changes with those made by other users. Once you click OK to the message, the GDE "resets" (removes) the broken lock.
At the command line use air lock break to break locks and air lock reset to remove broken locks.
U cannot lock a file if u own a broken lock on that file. You or the EME administrator must first reset the lock. An EME administrator can reset a user username's lock on file object-path by running:
air lock reset -user username -object object-path
Q64. What is a broken lock ? How to fix it?
Ans:
If u want to edit a file, u must obtain a lock on that file. This prevents multiple users from modifying the same file at same time. However, if another user wants to edit ur locked file, he/she must 1st break ur lock. The act of breaking lock creates a broken lock.
A broken lock is effectively a msg to the original lock owner that another user has broken the lock.
When u open the file with broken lock, the GDE displays a msg informing u that ur lock has been broken. At that point, u may need to reconcile your changes with those made by other users. Once you click OK to the message, the GDE "resets" (removes) the broken lock.
At the command line use air lock break to break locks and air lock reset to remove broken locks.
U cannot lock a file if u own a broken lock on that file. You or the EME administrator must first reset the lock. An EME administrator can reset a user username's lock on file object-path by running:
air lock reset -user username -object object-path
Q65. What is the difference between a phase and a checkpoint?
Ans:
The essential differences bet’n a phase & a checkpoint are their purpose & how the temporary files containing the data landed to disk are handled:
• Phases are used to break up a graph into blocks for performance tuning.
• Checkpoints are used for the purpose of recovery.
Details: Following descriptions clarify the differences bet’n phases & checkpoints:
• Phase: The primary purpose of phasing is performance tuning by managing resources. Phasing limits the number of simultaneous processes by breaking up a graph into different pieces, only one of which is running at any given time. One common use of phasing is to avoid deadlocks. The temporary files created by phasing are deleted at the end of the phase, regardless of whether the run was successful.
• Checkpoint: The main aim of checkpoints is to provide the means to restart a failed graph from some intermediate state. When a graph with checkpoints fails, the temporary files from the last successful checkpoint are retained so you can restart the graph from this point in the event of a failure. Only as each new checkpoint is completed successfully are the temporary files corresponding to the previous checkpoint deleted.
Ans:
The essential differences bet’n a phase & a checkpoint are their purpose & how the temporary files containing the data landed to disk are handled:
• Phases are used to break up a graph into blocks for performance tuning.
• Checkpoints are used for the purpose of recovery.
Details: Following descriptions clarify the differences bet’n phases & checkpoints:
• Phase: The primary purpose of phasing is performance tuning by managing resources. Phasing limits the number of simultaneous processes by breaking up a graph into different pieces, only one of which is running at any given time. One common use of phasing is to avoid deadlocks. The temporary files created by phasing are deleted at the end of the phase, regardless of whether the run was successful.
• Checkpoint: The main aim of checkpoints is to provide the means to restart a failed graph from some intermediate state. When a graph with checkpoints fails, the temporary files from the last successful checkpoint are retained so you can restart the graph from this point in the event of a failure. Only as each new checkpoint is completed successfully are the temporary files corresponding to the previous checkpoint deleted.
Q66. What is difference between fan-out and partition by round robin?
Ans:
1. Basically working of both components is similar but data skew can be present if you are using fan out as compared to PRR component.
Input: 1 2 3 4 5 6 7 8 9 0
Suppose u r working on 3 way partition then
Output in fan out can be:
Partition1:14580
Partition2:2679
Partition3:3
As data is flowed to all partition equally it resulted in data skew where as this will not happen through PRR.
2. Input layout and output layout of component is same then we can’t use fan out where as PRR can be used.
Serial layout -> fanout -> serial layout component (not possible)
Serial layout -> PRR -> serial layout component(possible)
3. Fan out will work faster as compared to PRR as data is flowing on the fly via data stream.
Ans:
1. Basically working of both components is similar but data skew can be present if you are using fan out as compared to PRR component.
Input: 1 2 3 4 5 6 7 8 9 0
Suppose u r working on 3 way partition then
Output in fan out can be:
Partition1:14580
Partition2:2679
Partition3:3
As data is flowed to all partition equally it resulted in data skew where as this will not happen through PRR.
2. Input layout and output layout of component is same then we can’t use fan out where as PRR can be used.
Serial layout -> fanout -> serial layout component (not possible)
Serial layout -> PRR -> serial layout component(possible)
3. Fan out will work faster as compared to PRR as data is flowing on the fly via data stream.
You can write your queries or suggestions to us on at AbiInterviewQuestions@gmail.com
Tip 12:
There are 3 different ways to checkin a file marked as Never Checkin:
at the command line,
from the EME MC and
from the GDE Checkin Wizard.
Details: Once a file is marked as Never Checkin in GDE Checkin Wizard, it is not available for checkin through wizard. The file is assigned the MIME type of ignore in the project files list.
To check in a file marked as Never Checkin:
• At the command line, run air project import on the file (say, dml/mydml.dml) with the -force and -files options (but not the -auto-add option):
air project import \
/Projects/lesson -basedir /disk1/data/sand/lesson \
-force \
-files dml/mydml.dml
Alternatively, run air project set-type to change the MIME type from ignore to one appropriate for the file. Then check in only the file (not its project).
• From the EME MC, use the Project Files List dialog. Double-click Ignored and change the value to No. Then choose another MIME type if the current one is not appropriate. Finally, check in only the file (not its project).
• From the GDE Checkin Wizard, check in only the file, selecting the Force overwrite check box on the Advanced Options dialog.
There are 3 different ways to checkin a file marked as Never Checkin:
at the command line,
from the EME MC and
from the GDE Checkin Wizard.
Details: Once a file is marked as Never Checkin in GDE Checkin Wizard, it is not available for checkin through wizard. The file is assigned the MIME type of ignore in the project files list.
To check in a file marked as Never Checkin:
• At the command line, run air project import on the file (say, dml/mydml.dml) with the -force and -files options (but not the -auto-add option):
air project import \
/Projects/lesson -basedir /disk1/data/sand/lesson \
-force \
-files dml/mydml.dml
Alternatively, run air project set-type to change the MIME type from ignore to one appropriate for the file. Then check in only the file (not its project).
• From the EME MC, use the Project Files List dialog. Double-click Ignored and change the value to No. Then choose another MIME type if the current one is not appropriate. Finally, check in only the file (not its project).
• From the GDE Checkin Wizard, check in only the file, selecting the Force overwrite check box on the Advanced Options dialog.
Tip 13:
We can commit intermediate results in the target table by creating a commit table in API mode.
Say suppose graph failed while loading 10million records. Then when we do rerun the graph it will skip over the previously commited records.
Use m_db create_commit_table
utility to create the commit table and specify it in commitTable parameter of output table. Also specify the commitNumber i.e. no. of rows to process before commiting the records to the target table.
We can commit intermediate results in the target table by creating a commit table in API mode.
Say suppose graph failed while loading 10million records. Then when we do rerun the graph it will skip over the previously commited records.
Use m_db create_commit_table
utility to create the commit table and specify it in commitTable parameter of output table. Also specify the commitNumber i.e. no. of rows to process before commiting the records to the target table.
Questions and Answers:
Q67. Explain what is lookup? Ans1:
Lookup is basically a specific dataset which is keyed. This can be used to mapping values as per the data present in a particular file (serial/multi file). The dataset can be static as well as dynamic ( in case the lookup file is being generated in previous phase & used as lookup file in current phase). Sometimes, hash-joins can be replaced by using reformat and lookup if one of the input to the join contains less number of records with slim record length.
AbInitio has built-in functions to retrieve values using the key for the lookup.
Ans2:
Lookup File consists of data records which can be held in main memory. This makes the transform function to retrieve the records much faster than retrieving from disk. It allows the transform component to process the data records of multiple files fastly.
Q67. Explain what is lookup? Ans1:
Lookup is basically a specific dataset which is keyed. This can be used to mapping values as per the data present in a particular file (serial/multi file). The dataset can be static as well as dynamic ( in case the lookup file is being generated in previous phase & used as lookup file in current phase). Sometimes, hash-joins can be replaced by using reformat and lookup if one of the input to the join contains less number of records with slim record length.
AbInitio has built-in functions to retrieve values using the key for the lookup.
Ans2:
Lookup File consists of data records which can be held in main memory. This makes the transform function to retrieve the records much faster than retrieving from disk. It allows the transform component to process the data records of multiple files fastly.
Q68. Do you know what a local lookup is?
Ans1:
lookup_local funtion retrieves the 1st matching record from a partitioned multifile,which is partitioned on a particular key.
consider lookup file "emp_detail" which is having fields emp_id, emp_name, dept and is partitioned on emp_id.
use,
lookup_local("emp_detail","456A"), then you will get matching record for "456A". where "456A" is the one of the emp_id.
Ans2:
If your lookup file is a multifile & partitioned/sorted on a particular key then local lookup function can be used ahead of lookup function call. This is local to a particular partition depending on the key. This will work as faster compared lookup function.
Ans1:
lookup_local funtion retrieves the 1st matching record from a partitioned multifile,which is partitioned on a particular key.
consider lookup file "emp_detail" which is having fields emp_id, emp_name, dept and is partitioned on emp_id.
use,
lookup_local("emp_detail","456A"), then you will get matching record for "456A". where "456A" is the one of the emp_id.
Ans2:
If your lookup file is a multifile & partitioned/sorted on a particular key then local lookup function can be used ahead of lookup function call. This is local to a particular partition depending on the key. This will work as faster compared lookup function.
Q69. What is Range Lookup?
Ans:
It returns the 1st data record When you defined the range indicated by the lower and upper bound arguments.
Ans:
It returns the 1st data record When you defined the range indicated by the lower and upper bound arguments.
Q70. How to retrieve multiple records from lookup file?
Ans:
We need to use lookup() or lookup_match() function to get first index of the record. And after that we have to use the lookup_next() function which will return a vector values of lookup type.
Ans:
We need to use lookup() or lookup_match() function to get first index of the record. And after that we have to use the lookup_next() function which will return a vector values of lookup type.
Tip 14:
Conditional DML
The DML that is used as a condition is known as conditional DML.
Suppose we have data that includes the Header Main data and Trailer as given below:
10 This data contains employee info.
20 emp_id emp_name salary
30 count
So the DML for the above structure would be:
record
decimal() id;
If( id 10)
begin
string() empl_info;
end
Else if(id 20)
begin
string() empl_id;
string() name;
end
Else if(id 30)
begin
decimal() count;
end
end;
Conditional DML
The DML that is used as a condition is known as conditional DML.
Suppose we have data that includes the Header Main data and Trailer as given below:
10 This data contains employee info.
20 emp_id emp_name salary
30 count
So the DML for the above structure would be:
record
decimal() id;
If( id 10)
begin
string() empl_info;
end
Else if(id 20)
begin
string() empl_id;
string() name;
end
Else if(id 30)
begin
decimal() count;
end
end;
Tip 15:
Environmental variables:
Environmental variables serves as global variables in UNIX environment. They are used for passing on values from a shell/process to another.
They are inherited by Abinitio as sandbox variables/graph. parameters like
AI_SORT_MAX_CORE AI_HOME
AI_SERIAL
AI_MFS etc.
To know what all variables exist in your Unix shell. find out naming convention & type command like
env | grep AI
This will provide you a list of all variables set in the shell. You can refer to the graph parameters/components to see how these variables are used inside AI.
Environmental variables:
Environmental variables serves as global variables in UNIX environment. They are used for passing on values from a shell/process to another.
They are inherited by Abinitio as sandbox variables/graph. parameters like
AI_SORT_MAX_CORE AI_HOME
AI_SERIAL
AI_MFS etc.
To know what all variables exist in your Unix shell. find out naming convention & type command like
env | grep AI
This will provide you a list of all variables set in the shell. You can refer to the graph parameters/components to see how these variables are used inside AI.
Questions and Answers:
Q71. How do you know that, a file that is a multifile or a serial file?
Ans:
By using m_expand –n <file-name>
If the answer is 1 then it is serial file.
If the answer is >1 then it is multi file.
Q71. How do you know that, a file that is a multifile or a serial file?
Ans:
By using m_expand –n <file-name>
If the answer is 1 then it is serial file.
If the answer is >1 then it is multi file.
Q72. How do you find out latest version of a particular graph? Syntax?
Ans:
air object versions /Projects/dev/esi/esi_trans/mp/esi_trans_trade_ism_load.mp
Ans:
air object versions /Projects/dev/esi/esi_trans/mp/esi_trans_trade_ism_load.mp