INDEX
Explanations
statements showing seriousness, concern, or action towards specific issues or incidents
terms related to serious concerns or allegations
New Auto-Interp
Negative Logits
ificent
-0.64
Smile
-0.64
Retrieved
-0.62
Machina
-0.60
regate
-0.58
etheless
-0.58
alde
-0.57
otten
-0.57
Stretch
-0.56
wonders
-0.54
POSITIVE LOGITS
seriously
1.54
Seriously
1.26
aback
1.06
hostage
1.06
lightly
0.98
offline
0.88
into
0.87
stride
0.84
literally
0.84
VERY
0.83
Activations Density 0.105%