INDEX
Explanations
instances of the word "even"
the word "even."
New Auto-Interp
Negative Logits
rend
-0.97
asus
-0.88
aim
-0.83
cel
-0.75
othy
-0.74
idem
-0.73
obi
-0.73
ugal
-0.72
scribe
-0.72
acker
-0.71
POSITIVE LOGITS
though
1.07
tho
0.95
remotely
0.93
mentioning
0.72
indirectly
0.69
handedly
0.69
worse
0.69
handed
0.69
joking
0.67
moderately
0.66
Activations Density 0.041%