INDEX
Explanations
mentions of being hired or fired
the word "fired" and its variations or related concepts
New Auto-Interp
Negative Logits
Bei
-0.75
alter
-0.74
adra
-0.70
Shant
-0.68
Gamble
-0.63
Dunk
-0.62
doi
-0.58
åij
-0.58
Wolfgang
-0.56
Sheen
-0.56
POSITIVE LOGITS
ired
1.22
IRED
1.06
iring
1.03
ategor
0.91
ansas
0.84
ilage
0.83
ragon
0.75
reenshots
0.72
rived
0.72
irements
0.72
Activations Density 0.010%