INDEX
Explanations
instances of words related to emphasis and strong assertions in discussions
New Auto-Interp
Negative Logits
PT
-0.16
adam
-0.15
isas
-0.15
verse
-0.15
VERSE
-0.15
ish
-0.15
hood
-0.15
chet
-0.14
isle
-0.14
von
-0.14
POSITIVE LOGITS
ington
0.17
point
0.16
/max
0.15
unden
0.15
nox
0.15
chrift
0.15
erus
0.15
rina
0.15
points
0.15
ingleton
0.14
Activations Density 0.039%