INDEX
Explanations
instances of comprehension and awareness in context
New Auto-Interp
Negative Logits
onta
-0.16
older
-0.15
aná
-0.15
usc
-0.15
uggy
-0.14
itou
-0.14
alaria
-0.14
igham
-0.14
abin
-0.14
allon
-0.14
POSITIVE LOGITS
fully
0.27
ably
0.26
fully
0.23
completely
0.23
why
0.22
ings
0.21
Fully
0.20
为ä»Ģä¹Ī
0.20
about
0.19
better
0.19
Activations Density 0.064%