INDEX
Explanations
phrases describing deprivation or suffering
negations, particularly forms of the word "not."
New Auto-Interp
Negative Logits
Reloaded
-0.78
ĪĴ
-0.70
ħĭ
-0.68
hemor
-0.63
Poster
-0.62
behavi
-0.61
Passenger
-0.61
Ĥİ
-0.60
Penguin
-0.60
çĦ
-0.59
POSITIVE LOGITS
ween
1.02
weet
0.92
unes
0.91
reprene
0.89
une
0.86
ract
0.83
achment
0.83
urb
0.82
aper
0.80
obi
0.79
Activations Density 0.099%