INDEX
Explanations
references to different fictional or real-world settings
references to fictional or narrative contexts
New Auto-Interp
Negative Logits
usterity
-0.86
ilee
-0.86
hammad
-0.83
ible
-0.82
idy
-0.75
assic
-0.74
actus
-0.73
agan
-0.72
obe
-0.71
istry
-0.69
POSITIVE LOGITS
Spray
0.76
aside
0.74
forth
0.71
showc
0.68
Gork
0.66
Setting
0.66
conducive
0.66
ters
0.65
URN
0.64
tle
0.62
Activations Density 0.020%