INDEX
Explanations
phrases expressing negative sentiments or actions
New Auto-Interp
Negative Logits
tero
-0.20
opes
-0.16
ail
-0.15
vox
-0.15
wort
-0.14
imar
-0.14
íĦ°
-0.13
à¸Ľà¸£à¸°à¸Īำ
-0.13
erosis
-0.13
.LoadScene
-0.13
POSITIVE LOGITS
GGLE
0.17
ób
0.17
toward
0.17
towards
0.15
exc
0.15
aran
0.15
Tow
0.15
ledo
0.15
ies
0.15
ļĮ
0.15
Activations Density 0.016%