INDEX
Explanations
instances of the word "Int" or related variations, likely indicating references to intelligence or introspection
New Auto-Interp
Negative Logits
elian
-0.15
perror
-0.15
ldb
-0.14
rsp
-0.14
mouseout
-0.14
atrice
-0.14
hatt
-0.14
Norm
-0.14
afi
-0.14
ahi
-0.14
POSITIVE LOGITS
RODUCTION
0.19
ention
0.19
umes
0.19
roducing
0.19
ended
0.18
emann
0.18
rog
0.18
ensive
0.18
ros
0.17
érieur
0.17
Activations Density 0.032%