INDEX
Explanations
specific names or terms that refer to people or characters
New Auto-Interp
Negative Logits
hurst
-0.16
&&&&
-0.15
izard
-0.15
afen
-0.14
addy
-0.14
hof
-0.14
ifton
-0.14
eddar
-0.14
hevik
-0.14
lint
-0.14
POSITIVE LOGITS
ucc
0.17
REW
0.16
Richards
0.15
tart
0.15
val
0.15
diff
0.14
fog
0.14
mond
0.14
Dag
0.14
essel
0.14
Activations Density 0.003%