INDEX
Explanations
phrases related to negative or risky situations
punctuation marks, specifically commas
New Auto-Interp
Negative Logits
ļéĨĴ
-0.69
irc
-0.69
seizure
-0.67
ivery
-0.64
rarily
-0.61
uci
-0.60
ery
-0.60
uble
-0.60
quartered
-0.58
irie
-0.58
POSITIVE LOGITS
albeit
1.14
namely
1.09
although
1.03
except
0.91
though
0.89
alas
0.88
depending
0.88
but
0.87
whereas
0.86
including
0.85
Activations Density 0.669%