INDEX
Explanations
phrases or words related to debates, discussions or questions
New Auto-Interp
Negative Logits
uras
-0.64
amina
-0.58
hetical
-0.58
urances
-0.58
iaries
-0.55
ogly
-0.54
istically
-0.54
alty
-0.53
ionage
-0.53
urance
-0.53
POSITIVE LOGITS
raging
0.71
naire
0.70
puff
0.70
raged
0.69
ably
0.66
moderators
0.65
halla
0.64
debate
0.63
ively
0.61
dayName
0.60
Activations Density 8.125%