INDEX
Explanations
names related to a specific individual or company
repeated mentions of a specific individual named Roh
New Auto-Interp
Negative Logits
IMAGES
-0.80
iliary
-0.79
dotted
-0.78
ciating
-0.71
Panther
-0.65
tongues
-0.64
littered
-0.64
fathers
-0.64
Occupations
-0.63
paycheck
-0.62
POSITIVE LOGITS
rer
0.94
Roh
0.93
mer
0.89
der
0.86
ata
0.83
rin
0.83
atche
0.83
sten
0.79
ani
0.78
ita
0.77
Activations Density 0.016%