INDEX
Explanations
phrases indicating contrast or exceptions
instances of the word "but" indicating contrast or shifting of ideas
New Auto-Interp
Negative Logits
enced
-0.61
encyclopedia
-0.56
Excellence
-0.54
pursu
-0.52
æĿ
-0.52
naire
-0.52
Orient
-0.51
ãģ£
-0.50
Cycle
-0.50
Corps
-0.50
POSITIVE LOGITS
chers
1.12
chery
1.09
tons
1.00
ts
0.86
ickets
0.84
alas
0.80
lers
0.77
ters
0.77
ler
0.77
nevertheless
0.75
Activations Density 0.152%