INDEX
Explanations
expressions conveying certainty or confidence in understanding or doing something
first-person references to knowledge and self-awareness
New Auto-Interp
Negative Logits
Alive
-0.66
âĢ¢âĢ¢
-0.62
Legends
-0.59
nowhere
-0.58
Chron
-0.58
mony
-0.58
awareness
-0.56
Dram
-0.56
Fortune
-0.55
guiActive
-0.55
POSITIVE LOGITS
're
0.83
mean
0.81
mean
0.75
fuss
0.74
doing
0.74
Mean
0.73
/$
0.72
meant
0.71
entail
0.70
need
0.69
Activations Density 0.084%