INDEX
Explanations
proper nouns and numbers
numerical data and statistics
New Auto-Interp
Negative Logits
âĸĵ
-0.57
comprom
-0.56
advertisement
-0.56
blockers
-0.55
inev
-0.54
Sov
-0.52
aggrav
-0.52
worrying
-0.51
icing
-0.51
outp
-0.50
POSITIVE LOGITS
âĵĺ
0.83
Joined
0.72
Died
0.66
itars
0.57
Posts
0.57
ahu
0.56
Female
0.52
Killed
0.52
ometown
0.51
Sets
0.51
Activations Density 0.796%