INDEX
Explanations
statements expressing personal opinions about various topics
New Auto-Interp
Negative Logits
idium
-0.79
istry
-0.73
assisted
-0.67
paced
-0.66
concess
-0.65
Mandatory
-0.62
bailed
-0.61
oiler
-0.60
staking
-0.60
cius
-0.60
POSITIVE LOGITS
ãĤ¤ãĥĪ
0.86
myself
0.75
IU
0.73
使
0.73
ourselves
0.71
omething
0.70
¶æ
0.70
yourself
0.70
oneself
0.69
ophile
0.69
Activations Density 8.522%