INDEX
Explanations
phrases related to research findings or experimental results
references to experimental or survey outcomes
New Auto-Interp
Negative Logits
cer
-0.69
capital
-0.65
eways
-0.64
hopping
-0.64
kers
-0.63
oos
-0.63
ACP
-0.62
vid
-0.62
leisure
-0.61
Rim
-0.61
POSITIVE LOGITS
results
1.13
results
1.10
Results
1.01
result
0.93
Results
0.92
result
0.85
iments
0.83
ĸļ
0.76
ULTS
0.76
inct
0.74
Activations Density 0.017%