INDEX
Explanations
mentions of sources or attributions in texts
references to sources or citations in a document
New Auto-Interp
Negative Logits
ucket
-0.78
raising
-0.74
frogs
-0.69
pad
-0.67
payer
-0.66
inary
-0.66
eared
-0.65
loss
-0.65
owl
-0.65
thin
-0.65
POSITIVE LOGITS
Via
1.02
Via
0.95
via
0.93
Wikimedia
0.90
ulture
0.78
ultural
0.77
0.73
Religion
0.70
ï¸ı
0.67
WARD
0.66
Activations Density 0.009%