INDEX
Explanations
proper nouns or specific names
references to past experiences or states of being
New Auto-Interp
Negative Logits
donald
-0.62
ê
-0.62
ire
-0.61
continue
-0.58
apper
-0.55
rone
-0.55
alist
-0.55
ween
-0.53
abouts
-0.53
mpire
-0.52
POSITIVE LOGITS
however
0.67
certainly
0.62
definitely
0.60
also
0.59
therefore
0.58
defin
0.56
always
0.53
absolutely
0.51
obviously
0.50
very
0.49
Activations Density 0.898%