INDEX
Explanations
the word "so" indicating a logical conclusion or continuation in a sentence
New Auto-Interp
Negative Logits
Cree
-0.67
Neigh
-0.64
ropolitan
-0.60
Kids
-0.59
burg
-0.59
Dre
-0.58
WN
-0.58
gie
-0.57
Tact
-0.57
Milan
-0.57
POSITIVE LOGITS
forth
1.45
forth
1.08
bered
1.03
othe
1.01
oths
0.98
apy
0.97
ooo
0.88
oooo
0.86
oner
0.83
far
0.81
Activations Density 0.026%