INDEX
Explanations
the word "ant" and its variations
New Auto-Interp
Negative Logits
conse
-0.54
fale
-0.54
pital
-0.49
Geplaatst
-0.49
Doble
-0.48
Titi
-0.47
kicking
-0.47
Pura
-0.47
facts
-0.46
ValueGenerated
-0.46
POSITIVE LOGITS
ant
2.06
Ant
1.55
ant
1.52
Ant
1.41
ANT
1.27
__":
0.88
ANT
0.84
__':
0.82
antre
0.81
antro
0.77
Activations Density 0.058%