INDEX
Explanations
quantifiers like "even"
the word "even" and its contextual variations
New Auto-Interp
Negative Logits
aim
-0.81
idelines
-0.72
ursed
-0.69
ffen
-0.68
ributes
-0.67
IGHT
-0.67
plex
-0.65
ample
-0.65
isen
-0.64
rend
-0.64
POSITIVE LOGITS
remotely
1.09
outright
0.77
romeda
0.72
tho
0.70
stranger
0.70
worse
0.70
though
0.67
nam
0.67
moderately
0.63
este
0.62
Activations Density 0.027%