300+ REAL TIME Interview Questions & Answers 2023

300+ [UPDATED] Tibco Business Events Interview Questions

1. What Is Tibco Be And Why It Is Used?

TIBCO BE is a well known software system for processing complex business events to draw useful meanings from them to better predict the business changes and to take proper actions accordingly.

2. What Is The Role Of Channels And Destinations In Tibco Be?

Channels are resources which are used to enable connectivity and communication between TIBCO BE and other sources like JMS sources, RV sources or HTTP sources.

Destinations are defined within a channel and they specify the source and sink for the message. For example, when you create a destination for a JMS Channel; it contains details about the destination queue name, delivery mode etc.

Shell Scripting Interview Questions

3. How Events Are Generated In Tibco Be?

Event instances get created based on the messages coming as input from the channels.

4. What Are Rules And How Tibco Be Rules Work?

In TIBCO BE, Rules specify the actions that need to be taken based on certain conditions. Rules are triggered based on events when conditions are met.

Shell Scripting Tutorial

5. What Is Difference Between Rule Functions And Virtual Rule Functions?

Rule Functions are the functions written in Rule Language with complete body while Virtual Rule Functions are like interfaces without body.

Body implementation of Virtual Rule Functions is made through decision tables instead.

TIBCO

6. What Is The Relationship Between Decision Tables And Virtual Rule Functions?

Decision tables are the body implementation of Virtual Rule Functions. A Virtual Rule Function can have one or more decision tables for its body implementation.

7. What Is Rms And Why It Is Used?

Rule Management Server (RMS) is a component of Business Events, which manages decision projects and provides a mechanism for approval. It also provides user authentication, decision project authorization, and other project management features. Decision Manager communicates with Rules Management Server to check out decision projects, update local copies of decision tables, and commit changes. RMS users can then approve or reject those changes.

Software Development Lifecycle (SDLC) Tutorial
Tibco BW

8. How Can We Prioritize And De-prioritize Rules For An Event?

For a certain event, we can have multiple rules available. The Priority value of any rule decides the sequence in which rules are triggered. A value closer to 1 means higher priority.

9. Describe The Purpose And Usage Of Tibco Be Concepts?

Concepts are created to hold the properties of any entity. Normally, information from the Events is used to create instances of the Concepts in the Rules and Rules Functions.

Spotfire (TIBCO)

10. What Is Event Preprocessor And Why It Is Used?

Event Preprocessor is basically a Rule Function. This rule function is used to process the incoming messages before they are converted into Events.

WSDL Tutorial

11. Why Scorecards Are Used In Tibco Be?

Scorecard is a type of concept in BE. Scorecard acts same as a static variable in any programming language with project wide scope and having only single instance. Scorecards are used in order to track or store such information which has to be used throughout the inference agent.

Software Development Lifecycle (SDLC)

12. Describe The Flow Of Messages In Tibco Be?
- Messages are received through Channels with specified destinations.
- Event Preprocessor is executed first (if it exists in the project).
- Incoming message is converted to an event.
- Rules are triggered based on event.

Shell Scripting Interview Questions

13. What Is Cdd And What’s Its Significance?

CDD (Cluster Deployment Description) is an XML file which contains all required information about deployment of a TIBCO BE project.

Unix/Linux Tutorial

14. How We Can Integrate Tibco Bw With Tibco Be?

Based on the type of channel configured in TIBCO BE, you can send messages from TIBCO BW and receive the responses. For example, If TIBCO BE has JMS Channel configured, you can send JMS messages to the specified destination from TIBCO BW using Send JMS Message activity and then you can receive a response as well using Receive JMS Message activity.

Similarly, you can communicate for HTTP, SOAP, RV or any other types of channels as well from TIBCO BW.

300+ [UPDATED] Statutory Compliance Interview Questions

1. What Is Pf Year?

PF accounting is done from March to February of every year. This period is called as PF Year. This is applicable only to PF and does not have any effect in any other modules.

2. Does Every Employee Contribute To Pf?

According to the PF rules (as of 2004), all employees who draw a PF Basic of less than Rs. 15000 have to contribute to PF. This is mandatory. For employees having PF Basic more than Rs. 15000, the deduction/contribution will be on 15000. However in most companies, PF is applicable to all employees. There are exceptions (people who do not contribute) but these are few.

Contract Law Interview Questions

3. What Is Pf Basic Or Pf Gross?

This is the amount on which the PF contribution or deduction is calculated. Generally in 95% of the companies this is Basic + Basic Arrears + DA + DA Arrears. However, there are always exceptions and some companies want it to be a little different. In that case, we can change the PF Basic formula to cater to the client’s requirements.

4. What Are The Various Reports Associated With Pf?
- PF Monthly report:
  This is a report that needs to be generated every month and it contains the details of Employee PF, Employer PF, VPF, and PF Basic for the employee. Based the numbers appearing in this report, the company needs to make payments to the government every month.
- PF – Form 5:
  This is a monthly report giving a list of all employees who have joined the company in the current month. Based on an option setting, this report can also indicate all the employees whose payroll has been processed for the first time in this month.
- PF – Form 10:
  This is a monthly report giving a list of all employees who have left/resigned the company in the current month. Based on an option setting, this report can also indicate all the employees whose PF has been settled in this month.
- PF – Form 12A:
  This is a monthly report giving a summary of the number of employees currently covered under PF, total contribution for the month, and other details.
- PF – Form 3A:
  This is a monthly report giving a summary of all the PF deductions (employee and employer) done for the employee in a given PF Year.
- PF – Form 6A:
  This is a monthly report giving a summary of Form 3A per employee. This report gives the total deductions done for each employee in a given PF Year.

HR Management Tutorial

5. What Are The Various Options For Pf Form 10?

Form 10 can be generated based on either Leaving Date or Settlement Date. This can be set in the General Options. The option name is USE_SETTLEDATE_IN_FORM10. If you set this option to YES then the Form 10 report will list all employees who are settled in the given date range. Otherwise, it will pick up all employees who have resigned in the given date range.

HR Management

6. What Is The Link Between Pf And Income Tax?

As per Indian Tax laws, any contribution done by the employee towards PF can be considered for Section 88 rebate.

In Folklore, any deduction done for PF and VPF is automatically considered for Section 88 rebate in the Income Tax module.

7. What Are The Various Esi Reports?

ESI Monthly report: this report is generated monthly. This gives the total deductions/contribution (employee and employer) done for each employee for a month.

ESI Form 6: this is a half yearly report.

ESI Form 7: this is also a half yearly report.

Performance Management Tutorial
Time Management

8. What Are The Other Rules Governing Esi?

ESI contribution stops when the employee’s gross income crosses Rs. 15000/- (up to May 2010 this was 10000)
- This stoppage of ESI cannot happen in any month. The change over from ESI deduction to No ESI deduction can happen only in 3rd (March) or 9th (September) month i.e. even if an employee’s salary crosses, 10000 in July, the ESI deduction will continue until August.
- In the September payroll, the employee’s ESI deduction will be zero (as has crossed Rs. 10000/-).
- The comparison of income is done with Full Income and the calculation of ESI is done on the actual payout (income).

9. An Employee Has Esi Deduction But Is Not Appearing In The Esi Report. What Could Be The Issue?

ESI report considers a field called ESI Eligibility. An employee name will display in the ESI report only if marked as ESI Eligible in employee information.

Verify if PF Eligible and ESI Eligible are selected in the Employee Information page of your application.

Office Management

10. What Is Esi Rules?

ESI. Scheme being contributory in nature, all the employees in the factories or establishments to which the Act applies shall be insured in a manner provided by the Act. The contribution payable to the Corporation in respect of an employee shall comprise of employer’s contribution and employee’s contribution at a specified rate. The rates are revised from time to time. Currently, the employee’s contribution rate (w.e.f. 1.1.97) is 1.75% of the wages and that of employer’s is 4.75% of the wages paid/payable in respect of the employees in every wage period. Employees in receipt of a daily average wage up to Rs.70/- are exempted from payment of contribution. Employers will however contribute their own share in respect of these employees.

11. What Are The Rates For Contribution (deduction) To Esi?

For employees, ESI deduction rate is 1.75% of the ESI Gross salary.

For employer, ESI contribution is at the rate of 4.75% of the ESI gross salary.

Labour Law

12. What Is Esi Gross Salary?

Generally, all the income components paid to an employee is considered for ESI Gross computation. In Folklore, ESI Gross is defined by the user. User indicates which income components are to be considered for ESI Gross.

Contract Law Interview Questions

13. How To Do Pf Process For My Existing Employees Using Greythr On A Monthly Basis?
1. Payroll → Process Payroll → Quick process (Your payroll will be processed here)
2. Go to Reports → Search for ECR → Select ‘PF – ECR Format’ Report from the search result → then, Select ‘Regular’ as a type → Click generate ECR file
3. Then a text file is downloaded on your system, PF admin user should login to PF Unified Portal and upload the generated ECR text file from greytHR

14. How To Do Pf Process For Resigned Employees Using Greythr?
1. Under Employee tab → Information → Separation → Select the employee and furnish the resignation details (tentative relieving date etc.,)
2. Once employee has left the organisation (after serving his notice period) do the following:
  Payroll → Payroll Inputs → Final settlement → Select the employee → Finish the settlement process
3. Payroll → Process Payroll → Quick process (Your payroll will be processed here)
4. Go to Reports → Search for ECR → Select ‘PF – ECR Format’ Report from the search result → then, ‘Exit Employees’ as a type → Click generate ECR file
  Then a text file is downloaded on your system, PF admin user should login to PF Unified Portal and upload the generated ECR text file from greytHR

15. Where To Enter The Uan Number In Greythr?

Under Employee → Information → PF Details add all the details

Payroll → Process Payroll → Quick process (Your payroll will be processed here)

Go to Reports → Search for ECR → Select ‘PF – ECR Format’ Report from the search result → then, ‘New Joinees’ as a type → Click generate ECR file

Then a text file is downloaded on your system, PF admin user should login to PF Unified Portal and upload the generated ECR text file from greytHR

Hr Generalist

16. How To Do Pf – Process For A New Entrant To Your Organisation Who Has Never Been Previously A Contributing Member Of Pf Using Greythr:?

Login to greytHR → go to Actions tab → Select the ‘Add employee link’ → Enroll an employee into greytHR by filling in all his necessary details, under the PF section – enable ‘Eligible for PF’ (Select ‘Regular PF Contribution (Max 1800)’ (or) ‘Excess PF Contribution (Above 1800)’)

Once you click ‘finish’ after enrolling an employee, on the same screen under related action select ‘Add Salary’

Enter only the Annual CTC of the employee (System will automatically do the component wise salary breakup)

PF Admin user needs to login to → Unified PF Portal with relevant KYC for the new member → Upload them and generate UAN

Enter the generated UAN number and the PF number in greytHR

17. What If Family Pension Fund Or Fpf Or Employee Pension Scheme Or Eps?

The contribution made by an employer towards PF is split into contribution towards PF and contribution to another scheme called Pension Scheme.

18. I Have An Error While Exporting Form 3a – (form 3a) To Edi Format. What Do I Verify?

Verify if the ODBC driver for PHP is installed on the PC, where the export is happening. This is needed as the EDI file is generated in DBF format.

HR Management

300+ [UPDATED] Static Timing Analysis Interview Questions

1. What Is Positive Slack?

The difference between required arrival time and actual arrival time is positive, then is called as positive slack. If there is positive slack, The design is meeting the timing requirements and still it can be improved.

2. In Back-end Design Which Violation Has More Priority? Why?

In back-end design, Hold violation has more priority than Setup Violation. Because hold violation is related to data path and not depends on clock. Setup violation can be eliminated by slowing down the clock (Increasing time period of the clock).

Digital Electronics Interview Questions

3. What Is Negative Slack?

The difference between required arrival time and actual arrival time is Negative, then it is called as Negative slack. If there is negative slack, the design is not meeting the timing requirements and the paths. which have negative slack called as violating paths. We have to fix these violations to make the design meeting timing.

4. What Is Slack?

The difference between Required Arrival Time and Actual Arrival Time is called as Slack. The amount of time by which a violation (Either setup or Hold) is avoided is called the slack.

Continuous Integration Tutorial

5. How Can You Avoid Hold Time Violations?
1. By adding delays using buffers
2. By adding lockup-latches

Verilog

6. What Is Static Timing Analysis(sta)?

Static timing analysis is a method for determining if a circuit meets timing constraints without having to simulate. So, it validates the design for desired frequency of operation, without checking the functionality of the design.

7. What Is Setup Time?

Setup time is the amount of time before the clock edge that the input signal needs to stable to guarantee it is properly accepted on the clock edge.

Digital Communication Tutorial
Continuous Integration

8. What Is Hold Time?

Hold time is the amount of time after the clock edge that the input should be stable to guarantee it is properly accepted on the clock edge.

9. What Is Setup And Hold Time Violations?

Violating above setup and hold time requirements is called setup and hold time violations. If there is setup and hold time violations in the design does not meet the timing requirements and the functionality of the design is not reliable. STA checks this setup and hold violations.

Linear integrated circuit

10. How Can You Avoid Setup Time Violations?
1. Play with clock (Useful) skew.
2. Redesign the flip flops to get lesser setup time
3. The combo logic between flip flops should be optimized to get minimum delay
4. Tweak launch flip-flop to have better slew at the clock pin, this will make launch flip-flop to be fast there by helping fixing setup violations.

Digital Signal Processing Tutorial

300+ TOP In Process QA (IPQA) Interview Questions [LATEST]

1. How Many Tablets Shall Be Taken For Checking Friability?

For tablets with unit mass equal or less than 650 mg, take sample of whole tablets corresponding to 6.5g.For tablets with unit mass more than 650mg,take a sample of 10 whole tablets.

2. What Is The Formula For Calculating Weight Loss During Friability Test?

%Weight loss = Initial Weight – Final Weight X 100Initial Weight.

3. What Is The Pass Or Fail Criteria For Friability Test?

Generally the test is run for once.If any cracked,cleaved or broken tablets present in the tablet sample after tumbling,the tablets fails the test.If the results are doubtful,or weight loss is grater than the targeted value,the test should be repeated twice and the mean of the three tests determined.A mean weight loss from the three samples of not more than 1.0% is considered acceptable for most of the products.

4. What Is The Standard Number Of Rotations Used For Friability Test?

100 rotations

5. What Is The Fall Height Of The Tablets In The Friabilator During Friability Testing?

6 inches.Tablets falls from 6 inches height in each turn within the apparatus.

6. Why Do We Check Hardness During Inprocess Checks?

To determine need for the pressure adjustments on the tableting machine.Hardness can affect the disintegration time.If tablet is too hard,it may not disintegrate in the required period of time. And if tablet is too soft it will not withstand handling and subsequent processing such as coating,packing etc.

7. What Are The Factors Which Influence Tablet Hardness?

1. 1. compression force
  2. Binder quantity(More binder more hardness)
  3. Moisture content

8. Which Type Of Tablets Are Exempted From Disintegration Testing?

Chewable Tablets

9. Which Capsule Is Bigger In Size – Size ‘0’ Or Size ‘1’?

‘0’ size

10. What Is The Recommended Temperature For Checking Dt Of A Dispersible Tablet?

25 ±10C (IP) & 15 – 250C (BP)

11. What Is Mesh Aperture Of Dt Apparatus ?

1.8 -2.2mm (#10)

12. List Out The Appearance Defects Of Tables During Compression Activity ?

Appearance Defects

1.Capping:-
‘Capping’ is the term used, when the upper or lower segment of the tablet separates horizontally, either partially or completely from the main body of a tablet and comes off as a cap, during ejection from the tablet press, or during subsequent handling.

2.Lamination / Laminating:-
Definition: ‘Lamination’ is the separation of a tablet into two or more distinct horizontal layers.

3.Sticking/filming:
‘ Sticking’ refers to the tablet material adhering to the die wall. Filming is a slow form of sticking and is largely due to excess moisture in the granulation.

4.Cracking:-
Small fine cracks observed on the upper and lower center surface of the tablets, or very rarely on the side wall are referred to as cracks.

5.Chipping:-
‘ Chipping’ is defined as the breaking of tablet edges, while the tablet leaves the press or during subsequent handling and coating operation.

6.Mottling:‘
Mottling’ is the term used to describe an unequal distribution of colour on a tablet.

7.Double Impression:
‘ Double impression’ involves only those punches,which have a monogram or other engraving on them.

13. What Is The Pass/fail Criteria For Disintegration Test?

If one or two tablets/capsules fails to disintegrate completely, repeat the test on another 12 additional dosage units. The requirement is meet if not fewer than 16 out of 18 tablets/capsules tested are disintegrated completely.

14. What Is The Recommended Storage Conditions For Empty Hard Gelatin Capsules?

15 – 250C & 35 -55% RH

15. Which Method Is Employed For Checking “uniformity Of Dosage Unit”?

A. Content uniformity

B. Weight Variation

Weight variation is applicable for following dosage forms;

Hard gelatin capsules,uncoated or film coated tablets,containing 25mg or more of a drug substance comprising 25% or more by weight of dosage unit.

16. What Is The Recommended Upward And Downward Movement Frequency Of A Basket-rack Assembly In A Dt Apparatus?

28 – 32 cycles per minute.

17. When Performing The ‘uniformity Of Weight’ Of The Dosage Unit, How Many Tablet/capsule Can Deviate The Established Limit?

Not more than two of the individual weights can deviates from the average weight by more than the percentage given in the pharmacopeia,and none can deviates more than twice that percentage.

Weight Variation limits for Tablets

IP/BP

Limit

USP

80 mg or less

10%

130mg or less

More than 80mg or Less than 250mg

7.5%

130mg to 324mg

250mg or more

More than 324mg

Weight Variation limits for Capsules

Limit

Less than 300mg

10%

300mg or More

7.5%

18. What Needs To Be Checked During Inprocess Qa Checks?

1. 1. Environmental Monitoring
  2. Measured values obtained from the process equipment (ex:temperature,RPM etc.)
  3. Measured values obtained from persons (ex:timmings,entries etc.)
  4. Process attributes (Ex:weight,hardness,friability etc.)

19. What Precautions Shall Be Taken While Collecting Inprocess Samples ?

While collecting inprocess samples, avoid contamination of the product being sampled (Don’t collect samples with bare hands) & avoid contamination of sample taken.

20. In A Tablet Manufacturing Facility ‘positive’ Pressure Is Maintained In Processing Area Or Service Corridors?

In tablet manufacturing facilities, pressure gradients are maintained to avoid cross contamination of products through air. Usually service corridors are maintained under positive pressure with respect to processing areas.

21. If Sticking Observed During Tablet Compression What May The Probable Reason For The Same?

1. 1. If the granules are not dried properly sticking can occur.
  2. Too little or improper lubrication.
  3. Too much binder
  4. Hygroscopic granular

22. What Checks Shall Be Carried Out, While Calibrating Dt Apparatus?

While calibrating DT apparatus, following checks shall be performed.

1. 1. Number of strokes per minute (Limit:29-32 cycles/min)
  2. Temperature by probe & standard thermometer(Limit: 37 ± 1 OC).
  3. Distance travelled by basket (Limit:53 -57mm)

23. What Is The Difference Between Calibration And Validation?

In tablet manufacturing facilities, pressure gradients are maintained to avoid cross contamination of products through air. Usually processing areas are maintained under positive pressure with respect to service corridors.

24. What Is In Process Checks?

In process checks are checks performed during an activity,In order to monitor and,if necessary,to adjust the process to ensure that product confirms to its specification.

25. What Is The Difference Between Disintegration And Dissolution?

Disintegration is a disaggregation process, in which an oral dosage form falls apart in to smaller aggregates.(Disintegration time is the ‘break up’ time of a solid dosage form).

26. Why Do We Calibrate A Qualified Equipment/instrument On Definite Intervals?

An equipment or instrument can ‘drift’ out of accuracy between the time of qualification and actual use.So it is recommended to calibrate and recalibrate the measuring devices and instruments on predetermined time intervals, to gain confidence on the accuracy of the data.

27. Why Do We Consider Three Consecutive Runs/batches For Process Validation? Why Not Two Or Four?

The number of batches produced in the validation exercise should be sufficient to allow the normal extent of variation and trends to be established and to provide sufficient data for evaluation and reproducibility.

First batch quality is accidental (co-incidental),
Second batch quality is regular (accidental),
Third batch quality is validation(conformation).

In 2 batch we cannot assure the reproducibility of data,4 batches can be taken but the time and cost are involved.

28. Position Of Oblong Tablets To Be Placed In Hardness Tester To Determine The Hardness? Lengthwise / Widthwise?

Position of oblong tablets should be length wise because the probability of breakage is more in this position.

300+ [UPDATED] Siebel System Admin Interview Questions

1. Determine Ootb Functionality In Siebel.?

OOTB stands for Out Of The Box. OOTB is referred as ‘vanilla’ .OTB is the base functionality which is the Siebel standard at the time of installation of Siebel. It is done before configuring or changing to Siebel.

2. Explain How To Create An Event Handler.?

1. Click on Administration->Communications->All Event Handler.
2. Click on add a new record from the Event Handler list.
3. Enter the details in the event handler –
- name
- specify the configuration to associate this event handler
- specify the event response to associate the event handler
- specify the profile for associating the event handler
- name of the device event,
- Order of event handler checking and comments, if any.

Siebel – CRM Interview Questions

3. Explain Audit Trail In Siebel.?

An audit trail generates a history of changes that were made to data of Siebel applications. It is a record that shows the person who accessed an item, the operation performed and the way the value changed. Hence it is useful for examining the history of certain records, documenting the the changes for further analysis and record keeping. Audit trial performs the persistence of data without any input from the users.

4. Explain In Brief About Siebmtsh.exe.?

Siebmtsh.exe is one of the processes created when Siebel server installs, depending upon the number of object managers which are used to enable the number of siebmtsh.exe processes increases.

Siebel – CRM Tutorial

5. What Are Caching And Purging In Siebel?

Caching is the process of saving the query results for later use, at the time of requesting the same query. By using cache, the cost of database processing needs to pay for the query once, not every time of query running.

Purging is the process of EAI data mapping engine methods, which is used only in development mode. Purge method is used for purging the existing map of the database. This method is used at the time of changing the map and run after the map changes.

Networking

6. What Is Link Specification?

Link specification is a field object type property. At the time of setting its value to true for a certain field the link specification can be retrieved in the child business component.

7. What Is Inbound And Outbound In Picklist?

Inbounded picklist:
In this process, values can not be entered by the users, other than specifying in drop down.

Out bounded picklist:
In this process, the user can enter any values outside the drop down.

Networking Tutorial
Linux

8. What Is The Difference Between Mvg Applet And Pick Applet?

Pick Applet:
A join based applet and displays every record that is available in the table and H2E. For example, if a pick applet of address on different accounts applets is created, all these addresses are available in the database and H2E.

MVG Applet:
This applet is based on 1-M or M-M link which displays the child records that are related to the parent, that initiates the MVG ,and H2E can add, query and delete the records of MVG and H2E.

9. What Are The 3 Major Steps In The Event Handler Process?

1. An event occurs that is similar to a call that is being disconnected. The events are forwarded by the telephony switch that communicates to the middleware server.

2. The events are forwarded by the middleware server and communicates the client business service.

3. The business service of the communications client serves the event and executes any action that is defined in the config data, or forwards the events to business service methods, or Siebel VB or Siebel eScript code.

System Administration

10. What Are The Functionalities Performed By Siebel?

Siebel is a software firm that design, develop, market and support the product of Customer Relationship management.

This is the company that produces the CRM applications and provides other functionalities like:
1. Application-Layer Switching:
  this provides the content distribution of an application within many application servers. It increases the overall performance of the application and the system. The traffic distribution can handle the increased demands of the client.
2. TCP/IP Multiplexing and Connection Management:
  allow the reduction to be done in the connections provided through the server. It manages and allows the server to reduce the infrastructure.
3. Web Compression:
  it improves the performance of the data that is sent from the server to the client side. The redundant data is removed from the messages that are passed to the clients.
4. Application Data Caching:
  it allows the frequently used data to be stored in the cache and server the purpose of increasing the performance on the client side and reduces the load on the servers.

Linux Tutorial

11. How Is Load Balancing Maintained In The Server?

There are many applications that use the servers to fulfill the request that is coming for the data, applications or any other content. This provides lot of capabilities to give user friendly environment to the client for the application access from the server. The load balancing on the server provides back end application or the database servers. The servers allow the grouping of similar applications at one place and the services also get grouped at single place by using the grouping functionality.

The services that are grouped, together, maps one or more servers that are used, to display the logical entity. This allows an easy environment in which the application can be used to provide easy upgradeable feature and maintenance options.

System Analysis and Design

12. What Is The Deployment Options Used For Load Balancing With Netscaler?

NetScaler is used in Siebel to provide the requests to be transferred from client browser to the server using the load balancing features that are provided by the Siebel servers. The servers in this case don’t require any configuration changes and it can be deployed easily without any delay. The server in the load balancing is directly communicating with the application server.
- The load balancing on the server can be done only after installing the database server. This server allows the instances to be used in round-robin fashion. That allows the configuration file to be only handled by the Server manager by generating the “generate lbconfig” command.
- The other way to do the load balancing is to get the request from the client browser and balance the load using different servers that are interconnected with each other. Netscalar in this case request the server to transfer the load to handle it in a more convenient manner.

Siebel – CRM Interview Questions

13. Explain The Steps Required To Implement The Load Balancer On Siebel Servers.?

The load balancer can be installed on the Siebel Application servers by using the configuration file that is given. The modification can be performed in the configuration file that is required by the server with the validation that is being put up by the server on the file.

The steps that is required:

– Update the eapps.cfg ?le that disable the Load Balancing of the server by changing the setting that are defined in the file and by changing the line as:

EnableVirtualHosts = False

– Modify the Object Manager connect string that is pointing to the Virtual IP and the port as well that is being used to connect the components of the object. Load balanced Object Manager allow the connection of the string to be done by changing the following lines:

ConnectString = siebel.TCPIP.None.None://://

System Analysis and Design Tutorial

14. Explain The Elements Used In The Following Given Statements With An Example.?

ConnectString = siebel.TCPIP.None.None://://

– The string that is used allow the updation to be performed in the configuration file and it points to the Virtual IP and the port that is getting used. The elements that are used inside this is as follows:

: allows the IP address to be associated with the virtual server that is defined in the NetScalar. This allows it to be mentioned in the Siebel Application server.

: allows the port number or the service to be defined using the virtual server. The default port of that virtual server is given as 2321.

: represent the name of the Siebel Enterprise that consists of the load balanced Siebel Servers that are used to provide the objects which can be reused again and again.

: it is the alias that is given to the Load Balanced Object.

Manager residing on the server, and it also provide the object management that can be passed to provide the loading balancing where the traffic on the server is more.

The view of the eapps.cfg configuration file is shown as:

#### Original eapps.cfg ?le ####

EnableVirtualHosts = true

[/callcenter_enu]

ConnectString =

siebel.TCPIP.None.None://VirtualServer/SBA_80/SCCObjMgr_enu

### Updated changes to the eapps.cfg ###

EnableVirtualHosts = false

[/callcenter_enu]

ConnectString =

siebel.TCPIP.None.None://IP_Address/SBA_80/SCCObjMgr_enu

15. Explain The Various Steps Involved Prior To Configuring Netscaler.?

The various steps to be followed prior to configuring Netscaler are:

1. Check that the network topology is well planned and configured properly. The network entity also requires the load balanced servers that are in place of the same subnet. It affects the network and the way it is being constructed.

2. Select a plan in which the Virtual IP address is used and the port is also configured. After configuring only the load balancing can be done. That allows easy configuration system and allow easy to implement the server.

3. Installing of the NetScaler takes place in the Data Center that is connected through switches and hubs and allow other parts to be connected for the communication.

4. Initialization of NetScaler is done using the license keys provided by the vendor and then assigning of the IP address is also done.

5. Setup allows all the high performance to be selected so that the server can be considered in high availability.

6. Configuration of the Network Gateway, Subnet, and other networking components is being done for the use of VLANs that is used by NetScaler. This makes the server to see which server can be used for load balancing.

7. Set up of the machines allows the application and the networking to be configured properly. This also allows different components to be connected to the server properly.

CCNA

16. Explain The Restrictions In Siebel That Comes While Planning The Networking Topology.?

The restriction that is given while planning the networking topology is:

On each instance the URL that is given for an object is pointing to one of the string that is defined by the Object Manager (OM). The virtual IP address and the port gets combined to give the complete URL for the semantic web search engine.

The application consists of one VIP (Virtual Internet Protocol) that can be shared using the multiple applications and allows it to be connected to different servers that provide tools to do load balancing together.

The customer that is running the application uses different channels by using the concept of partitioning:

Different VIPs are configured and multiple as well for a Siebel application. The servers are partitioned according to the VIPs that are used and that are configured. The multiple IP address is not necessary to perform the job.

Windows Server 2012 Tutorial

17. Explain The Netscaler One-arm Mode Deployment Model.?

NetScaler is deployed to provide high availability for the servers to allow the application to run on several platforms without being stuck. NetScaler one-arm mode allows the integration of Siebel environment into any physical server that includes the changes for an existing server or the network. The infrastructure that is provided with it is highly transparent and allows easy features and tools to make an upgrade in it. Siebel Applications doesn’t require any requirement that will make them handle the load balancing.

The servers are also called the load balancers where the network topology and the layout is being handled and deployed by the administrator to provide the correct information. Siebel administrator provides successful implementation of the TCP/IP address and the DNS servers to communicate between the Siebel servers and Virtual Ips that areconfigured through NetScaler.

Hardware and Networking

18. What Is The Use Of Global Policy Expressions Defined In Siebel?

The global policy expression allows setting the condition that is allowed to enter the content using the NetScaler system. The expression is used to represent the conditions and allow the policies to be made according to it. The policy expression can be shared between many systems and the components. This policy expression includes compression, integrated caching that allows the saving of the data source that is being visited and to increase the performance saved in the cache.

Policy expression uses content switching features to enable the services controlled by the policies that are made. The policy expression can be created using the configuration utility. This is the feature node that the system node uses of the NetScaler. System node includes the global repository that is used to provide the benefit of the system administrator’s responsibility for the expressions.

Networking

19. What Are The Steps Required To Set The Policy Expression?

There are steps to setup the policy expression in the system node as it includes the overall repository of the system and it is necessary to maintain the registry of the system. The system node uses the static caching of the system for that policies can be created.

The steps those are required as follows:

For the static caching the name of the policy expression gets created for the HTTP objects that need to be accessed and allowed to be controlled by the system node. The objects are like images consisting of gif, jpg, png, etc., JavaScripts that are allowed to provide the interaction between the client and the server, css that gives the visual representation of the overall design and the system to make it visually more appealing. The name of the policy expressions are added to show the increased compression like js and js_content_type.

20. How Is Netscaler Static Caching Maintained?

HTTP caching is maintained without the knowledge of any technique that is provided with the platform on which the Siebel gets installed. The services are made to transparent to allow the caching to be dynamic and there is a caching that takes place for both dynamic and static content. The dynamic content caching is saved as long as the program runs and the static content are saved till the user removes the cache from the browser.

The examples of Image file can be taken in this case that allow the static content to be cached by using NetScaler. The dynamic generation of the application content allow it not be cache-able as it allows more space and logic to store the dynamic content. The caching can be adjusted by changing the maximum size and header string and other changes can be done by going to the settings area.

Domain Name System(DNS)

21. What Is The Procedure To Configure The Static Caching Using The Content Group?

The content group is used to cache the object used by the NetScaler. This cache is the integrated cache that is used as a member to show the association between the objects. The association or the communication is being performed when an object is downloaded or stored in the system in there specified locations. The association is being represented in the policy that results in the caching of the object in the system and allows it to be made available when next time the user comes.

To configure the static cache using the content group use:
- The NetScaler configuration utility by going to the navigation panel
- Expanding the integrated cache node section.
- Select the sub-node from the content group.
- Select the name of the node and select the image that need to be added.
- Select the image of the average size and add it.

22. What Will Happen After The Content Groups Gets Created?

After selecting the memory and creating the group the memory setting gets completed and it creates a content group consisting of the images. The creation allows the content groups to be stored as static content objects. These objects use HTTP or HTTPS servers to initiate the caching policies that are being made for the servers.

The policies consisting of JavaScript, css, etc. It also allows the content group to be selected and used to provide other objects. The configuration utility is used in this case as well and also to expand the integrated caching node that allows objects to be called and passed to include more information.

23. How Is Server Monitoring Managed Using The Load Balancing Concept?

Server monitoring is used before configuring the load balancing server. The monitor allows the monitoring of the health of the server and considers it at various levels of criticality stages. NetScaler server is used to check the health of the server periodically to check the maintenance of it. Necessary action takes place by checking the server responses a specified destination and taking the appropriate actions on them.

NetScaler system uses a default TCP protocol to monitor automatically the server and its services that are created using the load balancing. Server monitoring is important to be used to allow the transaction between IP addresses can be seen and illegal activities can be blocked if any by the server itself.

Siebel EAI

24. How Is Siebel Server Set Up Using The Default Tcp Monitor?

Siebel servers are easy to set up and it requires some default parameters that can be used by NetScaler. The NetScaler uses the default TCP monitor to view all the security and server health. The server health is important to watch to save it from any failure that might occur during the running of the system.

The server checks the system and marks the services of the Siebel as “UP” if all the services are functioning properly. The default parameters that are provided become sufficient then the deployment will be based on the Siebel. The monitor is used to view the TCP address and the HTTP transmission of the data from one system to another system and allow only the verified data to transfer.

Linux

25. What Is The Process Of Configuring Siebel 8 Web Virtual Server?

The process of configuring the Siebel 8 Web virtual server is as follows:
1. Go to the Navigation Panel that is selected from the NetScaler Configuration Utility and then the load balancing node is expanded to incorporate new changes that is being made or to see the settings of the server.
2. Select the Virtual Server sub-node that allows the server to incorporate the changes that is being done and the load balance can be performed more efficiently. The performance of these increases due to this. Click Add to add the sub-node.
3. Write the name in the field mentioned with the Siebel service like name = _http_80
4. The server field includes the IP address of the server that is given as IP address is 172.16.10.242
5. The protocol field consists of HTTP field using the pull down to let the user choose the port on which they want the field to be binded.
6. To bind the services of Siebel to virtual server the activation of the services is to be done as this share the load from all the servers and allow the sharing of load to happen evenly.

26. How Is Session Persistence Used For Siebel?

Session persistence is given when the NetScaler load balancer is used to select the specific server that directs the server to the client requests. The client can request anything that need not to be sent to the server that is physically located somewhere and the state information can be transferred from one client to another. The session persistence can be enabled by using the cookies that can be configured to insert an HTTP in client responses.

The cookie is inserted in the field of the header through which the HTTP response is given to the web browser. The web browser is used to accept the cookies that are included using the request of the client made to the server. The method or the persistence option is selected that shows the cookie that is inserted for the use.

Cap Gemini Siebel CRM

27. What Is The Process To Create The Dynamic Drilldown?

Dynamic drilldown is a way that allows the drilling of the object through the single source that produce different views on certain conditions when applied gives the results according to the input that is given. The drilldown objects are controlled and allowed to be put using a specific condition.

The dynamic drilldown is created as follows:

Drilldown uses the destination object types that define the conditions that need to be determined and the view provides the different destinations to define the child objects. The conditions can be expressed by using the resources taken from you by default.

The parent drilldown acts the default function that is to be given and certain actions that can be taken on the system.

To create a dynamic drilldown object it requires:

1. Expansion of the Applet objects type from the object explorer. It allows easy exploration of the object and uses it according to the requirements.

2. The opportunity list gets selected that is easy for an object list editor.

3. The applet that is generated on one applet then the user can pay for the rest so that it can be allowed to display the drill down operations.

4. The block is being defined and the block is being given.

System Administration

28. What Is The Differences Between Static And Dynamic Drilldowns?

The static drilldown has a configuration that allows the object with the definition to be identified using the hyperlink field that is given during the submission of the field and the view. The property setting allows the user to properly set the purpose of dynamic drilling and it is used to specify the column or control setting. Whereas, dynamic drilldown allow the objects to have destination has object definitions. Each of the point to the type field in components allows the business tools to be integrated and allow the destination object to specify the list.

This is to specify list column or control that is given as hyperlink and allow many capabilities to be viewed when clicking on it. Whereas, the value in the dynamic drilldown allow the bride to be sitting next to you that allows different views given for any further actions.

29. What Is The Difference Between Vbc And Ebc In Siebel?

VBC stands for Virtual Business Component is a mechanism that is used in Siebel to allow the data to provide an external system that can be viewed using the Siebel application without replicating the fields and the dataset that I have given already. Whereas, EBCs stands for external business components and provide a way to add the resources that are accessed by the data. This data is not shown from our end only.

VBC provide the detailed description of the account that are stored in the external database. It also allow the stored information to be transferred that allows easy retrieval of the policy that is external to the system. Whereas, EBC doesn’t use the concept where there is no support given.

EBCs support the relationship and the mobile user doesn’t apply the same template. It allows the data to be transferred from one place to another. Whereas, VBCs are configured using the Siebel tools like MQSeries, insert and update on the data,

EBC allow the business object layer to be on the top and run on the Siebel server. Whereas, this is not allowed in the case of all the assignments that are passed together.

Siebel EIM

30. What Is Siebel Force Active?

The Force Active setting of TRUE indicates to the system that it must obtain data for the field every time the business component is accessed, even if the field is not displayed in the current applet; this adds the field to the SQL query each time.

31. Explain The Difference Between A Web Client, Dedicated Client And Mobile Client?

In web client, the application is accessed through a web browser; no software needs to be installed on the m/c except the web browser. It connects through the web server.

In dedicated client we need to install the client software on m/c. Dedicated Client connects directly to Siebel Database.Mobile Client connects through the Siebel Server.

300+ [UPDATED] Spark SQL Programming Interview Questions

1. What Is Shark?

Most of the data users know only SQL and are not good at programming. Shark is a tool, developed for people who are from a database background – to access Scala MLib capabilities through Hive like SQL interface. Shark tool helps data users run Hive on Spark – offering compatibility with Hive metastore, queries and data.

2. Most Of The Data Users Know Only Sql And Are Not Good At Programming. Shark Is A Tool, Developed For People Who Are From A Database Background – To Access Scala Mlib Capabilities Through Hive Like Sql Interface. Shark Tool Helps Data Users Run Hive On Spark – Offering Compatibility With Hive Metastore, Queries And Data.
1. Sensor Data Processing –Apache Spark’s ‘In-memory computing’ works best here, as data is retrieved and combined from different sources.
2. Spark is preferred over Hadoop for real time querying of data
3. Stream Processing – For processing logs and detecting frauds in live streams for alerts, Apache Spark is the best solution.

Python Interview Questions

3. What Is A Sparse Vector?

sparse vector has two parallel arrays –one for indices and the other for values. These vectors are used for storing non-zero entries to save space.

4. What Is Rdd?

RDDs (Resilient Distributed Datasets) are basic abstraction in Apache Spark that represent the data coming into the system in object format. RDDs are used for in-memory computations on large clusters, in a fault tolerant manner. RDDs are read-only portioned, collection of records, that are –
- Immutable – RDDs cannot be altered.
- Resilient – If a node holding the partition fails the other node takes the data.

Python Tutorial

5. Explain About Transformations And Actions In The Context Of Rdds.

Transformations are functions executed on demand, to produce a new RDD. All transformations are followed by actions. Some examples of transformations include map, filter and reduceByKey.

Actions are the results of RDD computations or transformations. After an action is performed, the data from RDD moves back to the local machine. Some examples of actions include reduce, collect, first, and take.

Core Java

6. What Are The Languages Supported By Apache Spark For Developing Big Data Applications?

Scala, Java, Python, R and Clojure

7. Can You Use Spark To Access And Analyse Data Stored In Cassandra Databases?

Yes, it is possible if you use Spark Cassandra Connector.

Core Java Tutorial
JDBC

8. Is It Possible To Run Apache Spark On Apache Mesos?

Yes, Apache Spark can be run on the hardware clusters managed by Mesos.

9. Explain About The Different Cluster Managers In Apache Spark

The 3 different clusters managers supported in Apache Spark are:
- YARN
- Apache Mesos -Has rich resource scheduling capabilities and is well suited to run Spark along with other applications. It is advantageous when several users run interactive shells because it scales down the CPU allocation between commands.
- Standalone deployments – Well suited for new deployments which only run and are easy to set up.

DB2 Using SQL

10. How Can Spark Be Connected To Apache Mesos?

To connect Spark with Mesos-
- Configure the spark driver program to connect to Mesos. Spark binary package should be in a location accessible by Mesos. (or)
- Install Apache Spark in the same location as that of Apache Mesos and configure the property ‘spark.mesos.executor.home’ to point to the location where it is installed.

JDBC Tutorial

11. How Can You Minimize Data Transfers When Working With Spark?

Minimizing data transfers and avoiding shuffling helps write spark programs that run in a fast and reliable manner. The various ways in which data transfers can be minimized when working with Apache Spark are:
1. Using Broadcast Variable- Broadcast variable enhances the efficiency of joins between small and large RDDs.
2. Using Accumulators – Accumulators help update the values of variables in parallel while executing.
3. The most common way is to avoid operations ByKey, repartition or any other operations which trigger shuffles.

Domain Name System(DNS)

12. Why Is There A Need For Broadcast Variables When Working With Apache Spark?

These are read only variables, present in-memory cache on every machine. When working with Spark, usage of broadcast variables eliminates the necessity to ship copies of a variable for every task, so data can be processed faster. Broadcast variables help in storing a lookup table inside the memory which enhances the retrieval efficiency when compared to an RDD lookup ().

Python Interview Questions

13. Is It Possible To Run Spark And Mesos Along With Hadoop?

Yes, it is possible to run Spark and Mesos with Hadoop by launching each of these as a separate service on the machines. Mesos acts as a unified scheduler that assigns tasks to either Spark or Hadoop.

DB2 Using SQL Tutorial

14. What Is Lineage Graph?

The RDDs in Spark, depend on one or more other RDDs. The representation of dependencies in between RDDs is known as the lineage graph. Lineage graph information is used to compute each RDD on demand, so that whenever a part of persistent RDD is lost, the data that is lost can be recovered using the lineage graph information.

15. How Can You Trigger Automatic Clean-ups In Spark To Handle Accumulated Metadata?

You can trigger the clean-ups by setting the parameter ‘spark.cleaner.ttl’ or by dividing the long running jobs into different batches and writing the intermediary results to the disk.

Hibernate

16. Explain About The Major Libraries That Constitute The Spark Ecosystem
- Spark MLib- Machine learning library in Spark for commonly used learning algorithms like clustering, regression, classification, etc.
- Spark Streaming – This library is used to process real time streaming data.
- Spark GraphX – Spark API for graph parallel computations with basic operators like joinVertices, subgraph, aggregateMessages, etc.
- Spark SQL – Helps execute SQL like queries on Spark data using standard visualization or BI tools.

Hibernate Tutorial

17. What Are The Benefits Of Using Spark With Apache Mesos?

It renders scalable partitioning among various Spark instances and dynamic partitioning between Spark and other big data frameworks.

MYSQL DBA

18. What Is The Significance Of Sliding Window Operation?

Sliding Window controls transmission of data packets between various computer networks. Spark Streaming library provides windowed computations where the transformations on RDDs are applied over a sliding window of data. Whenever the window slides, the RDDs that fall within the particular window are combined and operated upon to produce new RDDs of the windowed DStream.

Core Java

19. What Is A Dstream?

Discretized Stream is a sequence of Resilient Distributed Databases that represent a stream of data. DStreams can be created from various sources like Apache Kafka, HDFS, and Apache Flume. DStreams have two operations –
- Transformations that produce a new DStream.
- Output operations that write data to an external system.

Java Swing Tutorial

20. When Running Spark Applications, Is It Necessary To Install Spark On All The Nodes Of Yarn Cluster?

Spark need not be installed when running a job under YARN or Mesos because Spark can execute on top of YARN or Mesos clusters without affecting any change to the cluster.

Java Swing

21. What Is Catalyst Framework?

Catalyst framework is a new optimization framework present in Spark SQL. It allows Spark to automatically transform SQL queries by adding new optimizations to build a faster processing system.

22. Name A Few Companies That Use Apache Spark In Production.

Pinterest, Conviva, Shopify, Open Table

Scala Tutorial

23. Which Spark Library Allows Reliable File Sharing At Memory Speed Across Different Cluster Frameworks?

Tachyon

Work On Interesting Data Science Projects using Spark to build an impressive project portfolio!

SQLite

24. Why Is Blinkdb Used?

BlinkDB is a query engine for executing interactive SQL queries on huge volumes of data and renders query results marked with meaningful error bars. BlinkDB helps users balance ‘query accuracy’ with response time.

JDBC

25. How Can You Compare Hadoop And Spark In Terms Of Ease Of Use?

Hadoop MapReduce requires programming in Java which is difficult, though Pig and Hive make it considerably easier. Learning Pig and Hive syntax takes time. Spark has interactive APIs for different languages like Java, Python or Scala and also includes Shark i.e. Spark SQL for SQL lovers – making it comparatively easier to use than Hadoop.

Apache ZooKeeper Tutorial

26. What Are The Common Mistakes Developers Make When Running Spark Applications?

Developers often make the mistake of-
- Hitting the web service several times by using multiple clusters.
- Run everything on the local node instead of distributing it.
Developers need to be careful with this, as Spark makes use of memory for processing.

Apache Spark

27. What Is The Advantage Of A Parquet File?

Parquet file is a columnar format file that helps –
- Limit I/O operations
- Consumes less space
- Fetches only required columns.

DB2 Using SQL

28. What Are The Various Data Sources Available In Sparksql?
- Parquet file
- JSON Datasets
- Hive tables

Hyper SQL Database Tutorial

29. How Spark Uses Hadoop?

Spark has its own cluster management computation and mainly uses Hadoop for storage.

Scala

30. What Are The Key Features Of Apache Spark That You Like?
- Spark provides advanced analytic options like graph algorithms, machine learning, streaming data, etc
- It has built-in APIs in multiple languages like Java, Scala, Python and R
- It has good performance gains, as it helps run an application in the Hadoop cluster ten times faster on disk and 100 times faster in memory.

31. What Do You Understand By Pair Rdd?

Special operations can be performed on RDDs in Spark using key/value pairs and such RDDs are referred to as Pair RDDs. Pair RDDs allow users to access each key in parallel. They have a reduceByKey () method that collects data based on each key and a join () method that combines different RDDs together, based on the elements having the same key.

32. Which One Will You Choose For A Project –hadoop Mapreduce Or Apache Spark?

As it is known that Spark makes use of memory instead of network and disk I/O. However, Spark uses large amount of RAM and requires dedicated machine to produce effective results. So the decision to use Hadoop or Spark varies dynamically with the requirements of the project and budget of the organization.

Apache ZooKeeper

33. Explain About The Different Types Of Transformations On Dstreams?
- Stateless Transformations- Processing of the batch does not depend on the output of the previous batch. Examples – map (), reduceByKey (), filter ().
- Stateful Transformations- Processing of the batch depends on the intermediary results of the previous batch. Examples –Transformations that depend on sliding windows.

Domain Name System(DNS)

34. Explain About The Popular Use Cases Of Apache Spark

Apache Spark is mainly used for
- Iterative machine learning.
- Interactive data analytics and processing.
- Stream processing
- Sensor data processing

35. Is Apache Spark A Good Fit For Reinforcement Learning?

No. Apache Spark works well only for simple machine learning algorithms like clustering, regression, classification.

36. What Is Spark Core?

It has all the basic functionalities of Spark, like – memory management, fault recovery, interacting with storage systems, scheduling tasks, etc.

Hibernate

37. How Can You Remove The Elements With A Key Present In Any Other Rdd?

Use the subtractByKey () function

38. What Is The Difference Between Persist() And Cache()

persist () allows the user to specify the storage level whereas cache () uses the default storage level.

39. What Are The Various Levels Of Persistence In Apache Spark?

Apache Spark automatically persists the intermediary data from various shuffle operations, however it is often suggested that users call persist () method on the RDD in case they plan to reuse it. Spark has various persistence levels to store the RDDs on disk or in memory or as a combination of both with different replication levels.

The various storage/persistence levels in Spark are –
- MEMORY_ONLY
- MEMORY_ONLY_SER
- MEMORY_AND_DISK
- MEMORY_AND_DISK_SER, DISK_ONLY
- OFF_HEAP

40. How Spark Handles Monitoring And Logging In Standalone Mode?

Spark has a web based user interface for monitoring the cluster in standalone mode that shows the cluster and job statistics. The log output for each job is written to the work directory of the slave nodes.

MYSQL DBA

41. Does Apache Spark Provide Check Pointing?

Lineage graphs are always useful to recover RDDs from a failure but this is generally time consuming if the RDDs have long lineage chains. Spark has an API for check pointing i.e. a REPLICATE flag to persist. However, the decision on which data to checkpoint – is decided by the user. Checkpoints are useful when the lineage graphs are long and have wide dependencies.

42. How Can You Launch Spark Jobs Inside Hadoop Mapreduce?

Using SIMR (Spark in MapReduce) users can run any spark job inside MapReduce without requiring any admin rights.

Java Swing

43. How Spark Uses Akka?

Spark uses Akka basically for scheduling. All the workers request for a task to master after registering. The master just assigns the task. Here Spark uses Akka for messaging between the workers and masters.

44. How Can You Achieve High Availability In Apache Spark?
- Implementing single node recovery with local file system
- Using StandBy Masters with Apache ZooKeeper.

45. Hadoop Uses Replication To Achieve Fault Tolerance. How Is This Achieved In Apache Spark?

Data storage model in Apache Spark is based on RDDs. RDDs help achieve fault tolerance through lineage. RDD always has the information on how to build from other datasets. If any partition of a RDD is lost due to failure, lineage helps build only that particular lost partition.

46. Explain About The Core Components Of A Distributed Spark Application.
- Driver- The process that runs the main () method of the program to create RDDs and perform transformations and actions on them.
- Executor –The worker processes that run the individual tasks of a Spark job.
- Cluster Manager-A pluggable component in Spark, to launch Executors and Drivers. The cluster manager allows Spark to run on top of other external managers like Apache Mesos or YARN.

47. What Do You Understand By Lazy Evaluation?

Spark is intellectual in the manner in which it operates on data. When you tell Spark to operate on a given dataset, it heeds the instructions and makes a note of it, so that it does not forget – but it does nothing, unless asked for the final result. When a transformation like map () is called on a RDD-the operation is not performed immediately. Transformations in Spark are not evaluated till you perform an action. This helps optimize the overall data processing workflow.

48. Define A Worker Node.

A node that can run the Spark application code in a cluster can be called as a worker node. A worker node can have more than one worker which is configured by setting the SPARK_ WORKER_INSTANCES property in the spark-env.sh file. Only one worker is started if the SPARK_ WORKER_INSTANCES property is not defined.

49. What Do You Understand By Schemardd?

An RDD that consists of row objects (wrappers around basic string or integer arrays) with schema information about the type of data in each column.

50. What Are The Disadvantages Of Using Apache Spark Over Hadoop Mapreduce?

Apache spark does not scale well for compute intensive jobs and consumes large number of system resources. Apache Spark’s in-memory capability at times comes a major roadblock for cost efficient processing of big data. Also, Spark does have its own file management system and hence needs to be integrated with other cloud based data platforms or apache hadoop.

51. Is It Necessary To Install Spark On All The Nodes Of A Yarn Cluster While Running Apache Spark On Yarn ?

No , it is not necessary because Apache Spark runs on top of YARN.

52. What Do You Understand By Executor Memory In A Spark Application?

Every spark application has same fixed heap size and fixed number of cores for a spark executor. The heap size is what referred to as the Spark executor memory which is controlled with the spark.executor.memory property of the –executor-memory flag. Every spark application will have one executor on each worker node. The executor memory is basically a measure on how much memory of the worker node will the application utilize.

53. What Does The Spark Engine Do?

Spark engine schedules, distributes and monitors the data application across the spark cluster.

54. What Makes Apache Spark Good At Low-latency Workloads Like Graph Processing And Machine Learning?

Apache Spark stores data in-memory for faster model building and training. Machine learning algorithms require multiple iterations to generate a resulting optimal model and similarly graph algorithms traverse all the nodes and edges.These low latency workloads that need multiple iterations can lead to increased performance. Less disk access and controlled network traffic make a huge difference when there is lots of data to be processed.

55. Is It Necessary To Start Hadoop To Run Any Apache Spark Application ?

Starting hadoop is not manadatory to run any spark application. As there is no seperate storage in Apache Spark, it uses Hadoop HDFS but it is not mandatory. The data can be stored in local file system, can be loaded from local file system and processed.

56. What Is The Default Level Of Parallelism In Apache Spark?

If the user does not explicitly specify then the number of partitions are considered as default level of parallelism in Apache Spark.

57. Explain About The Common Workflow Of A Spark Program
- The foremost step in a Spark program involves creating input RDD’s from external data.
- Use various RDD transformations like filter() to create new transformed RDD’s based on the business logic.
- persist() any intermediate RDD’s which might have to be reused in future.
- Launch various RDD actions() like first(), count() to begin parallel computation , which will then be optimized and executed by Spark.

58. Name A Few Commonly Used Spark Ecosystems.

Spark SQL (Shark)

Spark Streaming

GraphX

MLlib

SparkR

59. What Is “spark Sql”?

Spark SQL is a Spark interface to work with structured as well as semi-structured data. It has the capability to load data from multiple structured sources like “text files”, JSON files, Parquet files, among others. Spark SQL provides a special type of RDD called SchemaRDD. These are row objects, where each object represents a record.

60. Can We Do Real-time Processing Using Spark Sql?

Not directly but we can register an existing RDD as a SQL table and trigger SQL queries on top of that.

61. What Is Spark Sql?

SQL Spark, better known as Shark is a novel module introduced in Spark to work with structured data and perform structured data processing. Through this module, Spark executes relational SQL queries on the data. The core of the component supports an altogether different RDD called SchemaRDD, composed of rows objects and schema objects defining data type of each column in the row. It is similar to a table in relational database.

62. What Is A Parquet File?

Parquet is a columnar format file supported by many other data processing systems. Spark SQL performs both read and write operations with Parquet file and consider it be one of the best big data analytics format so far.

63. List The Functions Of Spark Sql.

Spark SQL is capable of:
- Loading data from a variety of structured sources
- Querying data using SQL statements, both inside a Spark program and from external tools that connect to Spark SQL through standard database connectors (JDBC/ODBC). For instance, using business intelligence tools like Tableau
- Providing rich integration between SQL and regular Python/Java/Scala code, including the ability to join RDDs and SQL tables, expose custom functions in SQL, and more

64. What Is Spark?

Spark is a parallel data processing framework. It allows to develop fast, unified big data application combine batch, streaming and interactive analytics.

65. What Is Hive On Spark?

Hive is a component of Hortonworks’ Data Platform (HDP). Hive provides an SQL-like interface to data stored in the HDP. Spark users will automatically get the complete set of Hive’s rich features, including any new features that Hive might introduce in the future.

The main task around implementing the Spark execution engine for Hive lies in query planning, where Hive operator plans from the semantic analyzer which is translated to a task plan that Spark can execute. It also includes query execution, where the generated Spark plan gets actually executed in the Spark cluster.

66. What Is A “parquet” In Spark?

“Parquet” is a columnar format file supported by many data processing systems. Spark SQL performs both read and write operations with the “Parquet” file.

67. What Are Benefits Of Spark Over Mapreduce?

Due to the availability of in-memory processing, Spark implements the processing around 10-100x faster than Hadoop MapReduce. MapReduce makes use of persistence storage for any of the data processing tasks.
- Unlike Hadoop, Spark provides in-built libraries to perform multiple tasks form the same core like batch processing, Steaming, Machine learning, Interactive SQL queries. However, Hadoop only supports batch processing.
- Hadoop is highly disk-dependent whereas Spark promotes caching and in-memory data storage
- Spark is capable of performing computations multiple times on the same dataset. This is called iterative computation while there is no iterative computing implemented by Hadoop.

68. How Sparksql Is Different From Hql And Sql?

SparkSQL is a special component on the spark Core engine that support SQL and Hive Query Language without changing any syntax. It’s possible to join SQL table and HQL table.