INDEX
Explanations
occurrences of the word "on"
New Auto-Interp
Negative Logits
enance
-0.65
aneers
-0.64
BILITIES
-0.63
macros
-0.62
amend
-0.62
minors
-0.61
diluted
-0.60
oother
-0.60
BIP
-0.60
ometimes
-0.59
POSITIVE LOGITS
nen
1.34
etheless
1.10
stru
1.01
nect
1.01
stant
0.98
etic
0.95
cé
0.93
nie
0.91
osuke
0.88
ni
0.87
Activations Density 0.022%