INDEX
Explanations
positive or affirmative expressions and strong emotional sentiments
New Auto-Interp
Negative Logits
cha
-0.17
ยว
-0.16
otti
-0.16
takson
-0.15
arak
-0.15
urette
-0.15
ruk
-0.15
atori
-0.14
ิว
-0.14
.pa
-0.14
POSITIVE LOGITS
tron
0.16
uin
0.15
usz
0.15
osit
0.15
among
0.14
barr
0.14
mapped
0.14
μει
0.14
Avg
0.14
phia
0.14
Activations Density 0.003%