INDEX
Explanations
words related to physical injuries or impacts
words that begin with "er" or are related to errors or discrepancies
New Auto-Interp
Negative Logits
itionally
-0.78
ership
-0.71
parency
-0.70
IGHTS
-0.68
OUNT
-0.66
hips
-0.64
TPPStreamerBot
-0.63
INESS
-0.63
erness
-0.63
itional
-0.62
POSITIVE LOGITS
er
0.99
aser
0.94
asures
0.86
asure
0.83
ogenous
0.76
asing
0.74
rett
0.73
odox
0.73
rant
0.73
awa
0.72
Activations Density 0.006%