INDEX
Explanations
references to specific historical figures or texts
New Auto-Interp
Negative Logits
ftime
-0.20
abinet
-0.17
veter
-0.14
cuckold
-0.14
ÙĦÙ쨩
-0.14
جÙĨ
-0.14
ocard
-0.14
son
-0.14
apsulation
-0.14
cast
-0.14
POSITIVE LOGITS
Anne
0.38
Anne
0.31
Frank
0.30
anne
0.26
Frank
0.25
diary
0.24
Otto
0.24
Amsterdam
0.24
Holland
0.23
Dutch
0.23
Activations Density 0.001%