INDEX
Explanations
phrases related to negative or critical opinions about something
negative expressions or sentiments
New Auto-Interp
Negative Logits
Shank
-0.66
strengthened
-0.65
Reloaded
-0.64
Tik
-0.63
hardness
-0.59
Irwin
-0.59
DERR
-0.59
untled
-0.59
Nicarag
-0.59
hardened
-0.58
POSITIVE LOGITS
recomm
0.96
mom
0.93
distance
0.93
years
0.92
favorite
0.91
diff
0.91
dri
0.90
prison
0.90
dist
0.89
comments
0.88
Activations Density 0.109%