INDEX
Explanations
references to comparisons or contrasts between different groups or individuals
after capitalized words/names
negative informal criticism
New Auto-Interp
Negative Logits
ciasc
-0.80
våre
-0.68
habet
-0.66
quæ
-0.65
citoy
-0.65
AssemblyTitle
-0.64
yder
-0.63
généralement
-0.61
quelquefois
-0.61
plufieurs
-0.61
POSITIVE LOGITS
stuff
1.13
stupid
1.10
thing
1.09
damn
0.89
crappy
0.89
stupid
0.87
thingy
0.87
crap
0.87
damned
0.85
shitty
0.85
Activations Density 0.579%