INDEX
Explanations
significant nouns, particularly those related to people’s experiences and emotions
New Auto-Interp
Negative Logits
ona
-0.17
STE
-0.15
796
-0.15
ONA
-0.15
Ste
-0.14
ior
-0.14
Inlining
-0.14
еÑģа
-0.14
_tunnel
-0.14
TECTED
-0.14
POSITIVE LOGITS
asil
0.17
ombat
0.15
Ñĩе
0.15
kowski
0.15
ùng
0.15
Doyle
0.15
ehr
0.14
Pell
0.14
veal
0.14
oud
0.14
Activations Density 0.002%