INDEX
Explanations
words related to political or ideological shifts, particularly the concept of defection
words related to deception or deceit
New Auto-Interp
Negative Logits
fulness
-0.79
viks
-0.76
Sioux
-0.75
trans
-0.73
Fargo
-0.70
sent
-0.69
frames
-0.67
mob
-0.64
QC
-0.60
lifetime
-0.59
POSITIVE LOGITS
ection
1.50
ected
1.32
rador
1.00
ector
0.98
ect
0.97
ective
0.96
ipel
0.88
naire
0.85
ary
0.84
ocol
0.83
Activations Density 0.011%