INDEX
Explanations
mentions of specific names or terms related to automobiles and chemicals
references to specific people or entities, particularly those associated with politics or cultural issues
New Auto-Interp
Negative Logits
edin
-0.96
ablishment
-0.89
anamo
-0.87
rat
-0.86
untu
-0.86
yo
-0.82
raved
-0.81
alg
-0.80
ilon
-0.79
amen
-0.79
POSITIVE LOGITS
flame
0.76
::::::::
0.75
flare
0.75
âķIJâķIJ
0.73
Pryor
0.73
Lens
0.71
Ago
0.69
[|
0.68
âĸ¬
0.68
>>>>>>>>
0.66
Activations Density 0.046%