INDEX
Explanations
references to assistance or help-seeking behaviors
New Auto-Interp
Negative Logits
shrinks
-0.79
плек
-0.75
lujah
-0.74
scared
-0.73
Everybody
-0.71
thinks
-0.71
girls
-0.70
lasyon
-0.70
boss
-0.69
people
-0.68
POSITIVE LOGITS
utilizing
1.01
Дан
1.00
utilising
0.98
אשר
0.97
Дан
0.96
poichè
0.95
میباشد
0.94
данного
0.91
alábbi
0.90
tevens
0.89
Activations Density 1.220%