INDEX
Explanations
phrases indicating personal insight or knowledge
phrases that emphasize understanding or acknowledgment
New Auto-Interp
Negative Logits
utenberg
-0.66
aez
-0.62
anmar
-0.61
recomm
-0.60
arthed
-0.58
ossibility
-0.57
omal
-0.55
acco
-0.55
fortun
-0.55
mination
-0.55
POSITIVE LOGITS
what
1.08
why
1.00
how
1.00
terday
0.97
WHAT
0.88
whats
0.85
what
0.81
exactly
0.75
lege
0.71
somet
0.70
Activations Density 0.034%