INDEX
Explanations
references to specific names or titles mentioned in a longer text
New Auto-Interp
Negative Logits
ctica
-0.98
ITY
-0.95
lished
-0.90
ITAL
-0.87
hedral
-0.87
ity
-0.85
IAN
-0.83
today
-0.80
اÙĦ
-0.80
ertodd
-0.80
POSITIVE LOGITS
terday
1.14
asers
1.07
asing
1.06
asure
1.00
aser
1.00
velt
0.93
ldon
0.92
ases
0.91
oman
0.90
ats
0.90
Activations Density 0.810%