INDEX
Explanations
words related to specific names or entities
names and terms related to individuals and possible false information or deceit
New Auto-Interp
Negative Logits
ablishment
-0.93
Zucker
-0.83
steen
-0.79
Kart
-0.73
bered
-0.73
asury
-0.73
uality
-0.72
Freeze
-0.71
eele
-0.71
utsu
-0.70
POSITIVE LOGITS
ouse
0.74
ĪĴ
0.74
Ĩ
0.72
chemistry
0.71
sheet
0.70
oyd
0.69
compe
0.69
ŃĶ
0.69
seaf
0.68
enthusi
0.67
Activations Density 0.016%