INDEX
Explanations
words or parts of words containing "ew"
the word "new"
New Auto-Interp
Negative Logits
cort
-0.71
retri
-0.67
Downloadha
-0.66
burg
-0.65
unarmed
-0.63
apprehend
-0.63
thieves
-0.63
REDACTED
-0.62
administr
-0.61
esthetic
-0.61
POSITIVE LOGITS
estern
1.24
atts
0.99
esley
0.98
ew
0.97
een
0.96
sburg
0.95
olf
0.93
ild
0.92
eeks
0.92
ITNESS
0.91
Activations Density 0.009%