INDEX
Explanations
the presence of specific character sequences
New Auto-Interp
Negative Logits
hon
-0.19
eur
-0.18
sville
-0.18
eum
-0.17
eus
-0.17
t
-0.16
eurs
-0.16
eed
-0.16
h
-0.16
overy
-0.16
POSITIVE LOGITS
sembl
0.40
ylum
0.37
semblies
0.37
bestos
0.37
sembled
0.35
paragus
0.34
pects
0.34
sembler
0.34
semble
0.33
sembling
0.32
Activations Density 0.036%