INDEX
Explanations
mentions of specific individuals
specific names or references to individuals associated with a topic
New Auto-Interp
Negative Logits
ATURES
-0.73
eming
-0.70
:(
-0.69
SPONSORED
-0.68
odes
-0.67
Oracle
-0.66
Muslim
-0.66
Padres
-0.65
eneg
-0.63
OSED
-0.63
POSITIVE LOGITS
çͰ
0.78
Coul
0.69
shove
0.67
imeters
0.66
CPR
0.66
virt
0.63
ãĤ¼
0.62
ienne
0.61
rek
0.61
intraven
0.59
Activations Density 0.000%