INDEX
Explanations
sentences starting with the word "So"
New Auto-Interp
Negative Logits
saf
-0.73
thro
-0.64
nic
-0.60
sk
-0.58
exclusive
-0.57
Purg
-0.57
Halls
-0.56
degree
-0.56
ski
-0.56
âĹ¼
-0.55
POSITIVE LOGITS
oner
1.31
bered
1.09
fter
1.07
FTWARE
1.04
apy
1.02
ooo
0.96
oths
0.96
othes
0.95
far
0.92
aring
0.90
Activations Density 0.228%