INDEX
Explanations
instances of confusion or mistaken identity
instances of the word "confused."
New Auto-Interp
Negative Logits
inth
-0.74
©¶æ
-0.71
bors
-0.68
ighth
-0.67
ILA
-0.66
events
-0.62
home
-0.62
HEAD
-0.61
projects
-0.61
igers
-0.60
POSITIVE LOGITS
confused
1.17
confuse
1.07
bewild
0.91
confusion
0.90
ingly
0.89
baffled
0.86
Leilan
0.84
fully
0.82
misunderstand
0.81
misled
0.81
Activations Density 0.010%