INDEX
Explanations
the word "but" in various contexts
New Auto-Interp
Negative Logits
Berger
-0.66
antic
-0.63
Mandela
-0.63
Brett
-0.63
Holocaust
-0.62
Patri
-0.62
lies
-0.62
Rover
-0.61
???
-0.59
Roose
-0.58
POSITIVE LOGITS
sts
0.99
term
0.95
chery
0.89
tes
0.89
chel
0.88
Reviewer
0.88
chers
0.87
chie
0.86
aceous
0.84
ters
0.84
Activations Density 0.063%