INDEX
Explanations
social media or communication-related headings
occurrences of the character 'w/' in the text
New Auto-Interp
Negative Logits
agos
-0.72
obliged
-0.65
Reconstruction
-0.65
emort
-0.63
yrinth
-0.62
snipp
-0.62
Lumpur
-0.60
istar
-0.59
vertis
-0.59
terness
-0.59
POSITIVE LOGITS
sole
0.68
stood
0.66
hold
0.64
AFP
0.64
lain
0.64
lde
0.63
Mot
0.63
ÏĢ
0.62
ste
0.61
ATIONS
0.61
Activations Density 0.025%