INDEX
Explanations
expressions of strong affection or admiration
New Auto-Interp
Negative Logits
̧
-0.17
awy
-0.15
rech
-0.15
karak
-0.15
canf
-0.15
@student
-0.14
acie
-0.14
immel
-0.14
solete
-0.14
regist
-0.13
POSITIVE LOGITS
idge
0.17
Bounding
0.17
itty
0.17
ahy
0.15
eza
0.15
Traits
0.14
arus
0.14
'gc
0.14
IPA
0.14
ataka
0.14
Activations Density 0.023%