INDEX
Explanations
phrases related to expressing strong opinions or reactions, often involving praise or criticism
New Auto-Interp
Negative Logits
anwhile
-0.77
ilight
-0.72
grounds
-0.71
existed
-0.70
adj
-0.69
estate
-0.68
with
-0.67
Laboratories
-0.67
eworks
-0.64
there
-0.62
POSITIVE LOGITS
bang
1.11
vengeance
1.04
impunity
0.97
newfound
0.97
dignity
0.88
vig
0.87
flourish
0.87
gust
0.85
limp
0.85
suitcase
0.84
Activations Density 0.242%