INDEX
Explanations
references to investigations or examinations being conducted
terms related to official investigations or inquiries
New Auto-Interp
Negative Logits
Extrem
-0.75
AMES
-0.68
Decl
-0.68
Aid
-0.66
IAN
-0.65
compe
-0.64
vari
-0.62
reconc
-0.61
relative
-0.61
activated
-0.61
POSITIVE LOGITS
probe
1.14
Probe
1.10
probes
1.09
probing
1.08
inquiry
0.86
agher
0.84
investigating
0.76
investigation
0.76
gauge
0.74
questioning
0.72
Activations Density 0.010%