INDEX
Explanations
mentions of things that are large, heavy, or unwieldy
words related to classification
New Auto-Interp
Negative Logits
awaru
-0.76
htt
-0.66
rece
-0.66
Territory
-0.66
EMENT
-0.64
Palestin
-0.63
Democr
-0.60
MODE
-0.59
TAIN
-0.59
PLIED
-0.59
POSITIVE LOGITS
ipper
1.21
ojure
1.19
amped
1.18
ashing
1.16
avier
1.15
ogged
1.14
amps
1.14
iques
1.08
utch
1.07
ique
1.07
Activations Density 0.014%