INDEX
Explanations
names or terms related to individuals
mentions of specific names or entities
New Auto-Interp
Negative Logits
-+-+
-0.82
ï¸ı
-0.76
Adin
-0.69
Emb
-0.67
REF
-0.66
catentry
-0.66
RET
-0.64
Legend
-0.62
Skydragon
-0.61
Drift
-0.61
POSITIVE LOGITS
hof
0.95
opol
0.81
ocent
0.78
iae
0.78
utical
0.75
anchester
0.74
igans
0.74
acht
0.73
nikov
0.72
ican
0.72
Activations Density 0.065%