INDEX
Explanations
the title "Mr" used before names
New Auto-Interp
Negative Logits
actionGroup
-0.84
destro
-0.76
appre
-0.70
routed
-0.68
staging
-0.66
tolerate
-0.65
isolate
-0.65
anwhile
-0.62
overshadow
-0.61
cruc
-0.60
POSITIVE LOGITS
Hyde
0.82
Claus
0.82
Universe
0.81
Robot
0.76
rahim
0.76
Joseph
0.73
Hussein
0.72
Farage
0.70
.,
0.70
Chips
0.69
Activations Density 0.030%