INDEX
Explanations
adjectives describing the nature or quality of something
negative evaluations or critiques of various subjects
New Auto-Interp
Negative Logits
UF
-0.71
aha
-0.69
OME
-0.69
¬¼
-0.69
uay
-0.69
tz
-0.68
ickets
-0.66
rero
-0.66
vation
-0.65
asio
-0.65
POSITIVE LOGITS
albeit
1.15
but
0.96
although
0.94
namely
0.93
however
0.93
though
0.92
except
0.91
whereas
0.88
insofar
0.87
huh
0.78
Activations Density 0.336%