INDEX
Explanations
political and legal language
New Auto-Interp
Negative Logits
Sard
-0.68
Mob
-0.67
qqa
-0.67
ãĤ´
-0.66
ned
-0.66
APD
-0.66
etary
-0.65
SourceFile
-0.65
Mechdragon
-0.64
ciples
-0.63
POSITIVE LOGITS
uh
1.08
gasp
0.98
um
0.94
maybe
0.81
ah
0.79
yeah
0.79
wait
0.79
yeah
0.79
interrupted
0.77
look
0.74
Activations Density 0.240%