INDEX
Explanations
references to historical events, political figures, or locations
references to historical legislative acts and political events
New Auto-Interp
Negative Logits
Originally
-0.57
urai
-0.45
Variant
-0.44
DragonMagazine
-0.44
puzzled
-0.44
FANTASY
-0.43
surprised
-0.42
arij
-0.42
Nerd
-0.41
natureconservancy
-0.41
POSITIVE LOGITS
)).
0.83
.).
0.75
]."
0.72
]).
0.72
%.
0.65
".
0.63
''.
0.62
%).
0.62
'.
0.61
).
0.61
Activations Density 3.733%