INDEX
Explanations
names of individuals
references to the word "lad" and related variations
New Auto-Interp
Negative Logits
ELL
-0.83
kward
-0.69
plays
-0.67
forts
-0.67
EED
-0.67
basketball
-0.66
Bakr
-0.64
ticking
-0.64
ä¸Ń
-0.63
oldown
-0.62
POSITIVE LOGITS
imir
1.41
der
1.04
isl
0.93
mir
0.92
ynam
0.92
itionally
0.85
itional
0.82
amus
0.80
ewater
0.79
rian
0.78
Activations Density 0.036%