INDEX
Explanations
instances of the word "mis" repeated multiple times, likely indicating a focus on detecting words related to mistakes or missteps in the text
New Auto-Interp
Negative Logits
ILA
-0.67
INGS
-0.65
unto
-0.64
eteria
-0.63
¯¯¯¯
-0.61
sans
-0.61
Destroyer
-0.60
Robots
-0.60
Ready
-0.59
ieri
-0.59
POSITIVE LOGITS
cellaneous
1.42
appropri
1.30
beh
1.18
pelled
1.10
behavior
1.10
informed
1.09
aligned
1.07
character
1.03
jud
1.01
managed
1.01
Activations Density 0.014%