INDEX
Explanations
proper nouns
single-letter or short names that likely refer to people or entities
New Auto-Interp
Negative Logits
wide
-0.73
Driving
-0.67
Wide
-0.65
projector
-0.65
****
-0.64
fluorescent
-0.63
��������
-0.62
1600
-0.62
***
-0.62
DISTRICT
-0.61
POSITIVE LOGITS
ipp
0.96
utter
0.89
lem
0.88
ourn
0.87
aren
0.87
essel
0.85
airo
0.84
isner
0.84
ubb
0.84
iller
0.84
Activations Density 0.183%