INDEX
Explanations
references to "Alice in Wonderland" and related literary elements
New Auto-Interp
Negative Logits
strike
-0.15
velt
-0.15
LEX
-0.14
kiss
-0.14
AttributeName
-0.14
[".
-0.14
slick
-0.14
Ì
-0.14
odega
-0.14
Segue
-0.14
POSITIVE LOGITS
Alice
0.50
Alice
0.44
Wonderland
0.42
alice
0.40
Lewis
0.36
alice
0.32
Lewis
0.32
Carroll
0.27
Alic
0.25
Jab
0.25
Activations Density 0.009%