INDEX
Explanations
the word "so" in the text
the repeated phrase "so" used to emphasize statements
New Auto-Interp
Negative Logits
theless
-0.60
marks
-0.58
haven
-0.57
wreck
-0.56
work
-0.56
works
-0.56
presentation
-0.54
女
-0.53
INGTON
-0.53
customs
-0.53
POSITIVE LOGITS
bered
1.26
oths
1.20
othes
1.17
apy
1.05
oner
0.98
othe
0.97
oooo
0.95
iled
0.94
ooo
0.91
aps
0.89
Activations Density 0.115%