INDEX
Explanations
adjectives and adverbial phrases that describe characteristics or behaviors
New Auto-Interp
Negative Logits
unger
-0.16
citizen
-0.15
uner
-0.14
viewer
-0.14
essenger
-0.14
rray
-0.13
celik
-0.13
ileo
-0.13
lever
-0.13
Rays
-0.13
POSITIVE LOGITS
TORT
0.16
endez
0.15
پس
0.15
æ¾
0.14
chy
0.14
Pall
0.14
æĸ¯çī¹
0.14
nell
0.14
gord
0.14
ulu
0.14
Activations Density 0.004%