INDEX
Explanations
content that follows a specific pattern or signal in the text
references to social media interactions
New Auto-Interp
Negative Logits
immer
-0.75
imm
-0.75
access
-0.70
elo
-0.68
govtrack
-0.66
mud
-0.65
ice
-0.65
vere
-0.64
adle
-0.63
eri
-0.63
POSITIVE LOGITS
noon
0.83
Following
0.75
Following
0.75
ĸļ
0.73
follows
0.70
teen
0.70
LLOW
0.68
Follow
0.68
Steps
0.68
SourceFile
0.67
Activations Density 0.012%