INDEX
Explanations
references to specific animal species, particularly foxes and platypuses, along with mentions of particular historical events
New Auto-Interp
Negative Logits
Rated
-0.43
itutional
-0.40
apter
-0.40
atoon
-0.40
Norn
-0.39
acements
-0.38
Edison
-0.36
Effective
-0.36
Kuh
-0.35
assy
-0.35
POSITIVE LOGITS
es
0.63
naire
0.55
holes
0.47
hound
0.47
conn
0.47
hog
0.46
eln
0.45
hole
0.45
manship
0.45
hun
0.45
Activations Density 8.423%