INDEX
Explanations
actions and activities that individuals engage in or express
New Auto-Interp
Negative Logits
amespace
-0.20
Toolkit
-0.17
abant
-0.16
ilon
-0.16
overy
-0.15
飯
-0.15
елов
-0.15
ffset
-0.14
erial
-0.14
imers
-0.14
POSITIVE LOGITS
Gauss
0.15
igung
0.15
inction
0.15
koc
0.15
ë¶Ħ
0.15
ple
0.14
sonst
0.13
remodel
0.13
YTE
0.13
oldt
0.13
Activations Density 0.147%