INDEX
Explanations
words related to expressing knowledge or certainty
New Auto-Interp
Negative Logits
isco
-0.84
aukee
-0.76
onding
-0.75
issance
-0.72
mage
-0.69
gencies
-0.67
aez
-0.67
pload
-0.67
orthy
-0.66
sidx
-0.66
POSITIVE LOGITS
firsthand
1.09
how
0.85
anecd
0.82
nothing
0.80
plenty
0.78
personally
0.75
what
0.74
exactly
0.72
why
0.72
somet
0.69
Activations Density 0.039%