INDEX
Explanations
statements emphasizing what is known or certain
phrases asserting knowledge or certainty about various topics
New Auto-Interp
Negative Logits
pex
-0.93
ksh
-0.75
osi
-0.73
onies
-0.70
ermanent
-0.70
rentice
-0.68
cohol
-0.68
onial
-0.65
oling
-0.64
inance
-0.64
POSITIVE LOGITS
ourselves
0.92
how
0.85
beforehand
0.80
ledged
0.72
ledge
0.72
anecd
0.68
guesses
0.68
lege
0.68
ETH
0.68
definitively
0.67
Activations Density 0.092%