INDEX
Explanations
expressions of high emotional intensity or strong opinions
New Auto-Interp
Negative Logits
ibar
-0.17
vie
-0.16
ovable
-0.15
logen
-0.15
ailer
-0.14
onso
-0.14
leta
-0.14
reira
-0.14
efe
-0.14
adoo
-0.14
POSITIVE LOGITS
Pros
0.20
pros
0.17
Pros
0.17
Overall
0.15
ajs
0.15
overall
0.15
Aws
0.14
âĢı
0.14
overall
0.14
íĺ¼
0.14
Activations Density 0.043%