INDEX
Explanations
emotionally charged terms and expressions of personal experiences
New Auto-Interp
Negative Logits
in
-0.88
تقاوى
-0.70
aining
-0.66
inti
-0.66
inals
-0.64
ining
-0.64
in
-0.63
ing
-0.62
line
-0.61
principalColumn
-0.61
POSITIVE LOGITS
cutt
0.71
aga
0.63
certa
0.61
comb
0.61
rema
0.58
surpris
0.58
beg
0.58
mak
0.57
expla
0.56
liv
0.56
Activations Density 0.107%