INDEX
Explanations
names of people, likely from news articles
New Auto-Interp
Negative Logits
tenance
-0.65
WARN
-0.65
hormonal
-0.63
Terry
-0.62
cereal
-0.62
Aid
-0.61
Intermediate
-0.61
Bland
-0.61
FR
-0.61
riber
-0.61
POSITIVE LOGITS
aret
2.27
aspers
1.96
MK
1.85
este
1.83
opard
1.80
alus
1.19
Latest
1.06
Kop
0.96
rero
0.96
Latest
0.91
Activations Density 0.042%