INDEX
Explanations
statements related to political figures and their actions or behaviors
New Auto-Interp
Negative Logits
slee
-0.63
board
-0.61
ilated
-0.60
Travels
-0.59
apr
-0.56
rush
-0.56
Offline
-0.55
awoke
-0.55
scrimmage
-0.53
aw
-0.53
POSITIVE LOGITS
nor
1.67
Nor
1.38
Instead
1.25
Instead
1.25
Rather
1.24
nor
1.23
merely
1.19
Rather
1.17
Nor
1.16
Nevertheless
1.10
Activations Density 5.267%