INDEX
Explanations
words related to political figures or events
proper nouns and names, particularly those associated with events and public figures
New Auto-Interp
Negative Logits
Inquis
-1.02
FE
-1.00
Fei
-0.96
FE
-0.91
Ez
-0.83
isf
-0.83
FD
-0.83
fe
-0.80
Flip
-0.78
Flo
-0.78
POSITIVE LOGITS
rams
0.98
roma
0.85
rom
0.81
haar
0.79
RAM
0.79
SAM
0.78
Bav
0.78
ARA
0.78
ram
0.78
Ram
0.78
Activations Density 0.606%