INDEX
Explanations
names related to a particular person, possibly a politician
words related to reductions or decreases in various contexts
New Auto-Interp
Negative Logits
printf
-0.71
Ü
-0.67
sung
-0.66
lived
-0.66
DOC
-0.61
nep
-0.59
SUP
-0.59
mbuds
-0.58
ãĤµ
-0.57
live
-0.57
POSITIVE LOGITS
uce
1.39
uces
1.11
llor
1.01
uction
0.98
uced
0.95
xual
0.94
eer
0.90
lect
0.87
lectic
0.85
ucing
0.82
Activations Density 0.012%