INDEX
Explanations
statements expressing personal preferences or beliefs
expressions of strong personal identity and self-description
New Auto-Interp
Negative Logits
disappear
-0.69
vanish
-0.66
indefinitely
-0.66
scrimmage
-0.66
Regions
-0.64
VERTISEMENT
-0.64
advances
-0.63
allocations
-0.62
enges
-0.62
çķ
-0.61
POSITIVE LOGITS
lucky
0.93
believer
0.91
myself
0.91
fortunate
0.84
proud
0.80
opic
0.76
skept
0.73
obsessed
0.73
skeptical
0.72
strong
0.71
Activations Density 0.230%