INDEX
Explanations
references to a specific name or entity
mentions of the word "Der"
New Auto-Interp
Negative Logits
Samoa
-0.67
INTER
-0.66
Elephant
-0.65
odcast
-0.63
toggle
-0.62
[|
-0.62
hetti
-0.61
ZI
-0.61
Fenrir
-0.60
ships
-0.59
POSITIVE LOGITS
ricks
1.08
bys
1.04
ivation
0.99
ived
0.87
bil
0.87
rick
0.86
ription
0.84
Spiegel
0.84
rek
0.82
cliffe
0.81
Activations Density 0.021%