INDEX
Explanations
words or prefixes related to the idea of an "ideal" state or standard
references to ideologies or ideals
New Auto-Interp
Negative Logits
enegger
-0.94
iona
-0.85
ichick
-0.85
ufact
-0.78
wagen
-0.74
cffff
-0.71
lished
-0.69
ions
-0.69
é¾
-0.68
ishable
-0.67
POSITIVE LOGITS
gger
1.03
ll
0.91
lli
0.91
ously
0.82
maid
0.82
lla
0.81
Dhabi
0.81
llo
0.79
llan
0.74
vice
0.71
Activations Density 0.044%