INDEX
Explanations
numbers and percentages
occurrences of the substring "fe"
New Auto-Interp
Negative Logits
GEAR
-0.75
worthy
-0.73
ANGEL
-0.72
pora
-0.70
agically
-0.66
nas
-0.66
antically
-0.63
mag
-0.63
azines
-0.63
inates
-0.63
POSITIVE LOGITS
eling
1.15
ck
1.04
cker
1.00
els
0.99
lda
0.98
encing
0.93
elin
0.93
cking
0.92
ffer
0.92
zza
0.91
Activations Density 0.011%