INDEX
Explanations
instances of the word "plain."
New Auto-Interp
Negative Logits
Explanation
-0.53
NUKAT
-0.53
лтемелер
-0.51
yethyl
-0.50
OFDb
-0.50
bijt
-0.46
Winfield
-0.45
bluzka
-0.45
MIDDLEWARE
-0.45
Destroyer
-0.45
POSITIVE LOGITS
plain
0.71
regime
0.70
regimes
0.66
Regime
0.65
boss
0.62
boot
0.60
Boss
0.59
boots
0.59
Boss
0.58
Boots
0.58
Activations Density 0.053%