INDEX
Explanations
phrases containing terms related to political ideologies or extremist views
terms related to political extremism, particularly referencing far-left and far-right ideologies
New Auto-Interp
Negative Logits
Cola
-0.67
CODE
-0.63
LAW
-0.61
advertisement
-0.61
Plot
-0.61
behav
-0.59
Nightmares
-0.59
dayName
-0.59
Owner
-0.58
>:
-0.57
POSITIVE LOGITS
reaching
0.87
ranging
0.87
coe
0.81
ishly
0.80
sighted
0.78
hest
0.78
bent
0.76
fetched
0.76
away
0.73
rency
0.72
Activations Density 0.073%