INDEX
Explanations
expressions related to introspection and self-reflection
New Auto-Interp
Negative Logits
enta
-0.15
redits
-0.15
iel
-0.14
ίνη
-0.13
eson
-0.13
IGHT
-0.13
635
-0.13
isque
-0.13
leton
-0.13
913
-0.13
POSITIVE LOGITS
it
0.54
å®ĥ
0.37
It
0.35
it
0.34
It
0.34
_it
0.32
nó
0.31
itu
0.27
it
0.26
,it
0.26
Activations Density 0.444%