INDEX
Explanations
phrases indicating certainty or knowledge
statements reflecting certainty or awareness
New Auto-Interp
Negative Logits
isco
-0.91
onding
-0.78
issance
-0.75
aez
-0.72
acies
-0.71
sidx
-0.71
aukee
-0.69
gencies
-0.69
erva
-0.67
taboola
-0.67
POSITIVE LOGITS
firsthand
1.01
how
0.82
exactly
0.79
nothing
0.77
plenty
0.71
nothing
0.71
what
0.71
ledged
0.70
personally
0.68
beforehand
0.66
Activations Density 0.051%