INDEX
Explanations
references to personal relationships and pronouns
New Auto-Interp
Negative Logits
832
-0.17
411
-0.17
azi
-0.15
rama
-0.14
nar
-0.14
song
-0.14
άζ
-0.14
616
-0.14
ocity
-0.14
393
-0.14
POSITIVE LOGITS
iot
0.16
Dumpster
0.15
äºľ
0.14
etsk
0.14
enic
0.14
Cham
0.14
PropTypes
0.14
ħ
0.14
iso
0.13
æĿ
0.13
Activations Density 0.459%