INDEX
Explanations
references to political and societal issues
New Auto-Interp
Negative Logits
manoeuv
-0.70
Trog
-0.67
laughter
-0.66
Vald
-0.63
blinded
-0.62
travellers
-0.61
boro
-0.61
agony
-0.61
doubles
-0.60
Alic
-0.60
POSITIVE LOGITS
³³³
1.33
³³³³
1.22
³³³³³³³³³³³³³³³³
1.21
³³³³³³³³
1.16
³³
1.07
Reason
1.05
Specifically
0.98
ccording
0.95
Firstly
0.94
Consider
0.93
Activations Density 0.420%