INDEX
Explanations
adjectives or phrases related to correctness or appropriateness
instances of the word "appropriate" and its context in relation to behavior or actions
New Auto-Interp
Negative Logits
chet
-0.87
plane
-0.79
planes
-0.78
glass
-0.78
urger
-0.77
cher
-0.76
ker
-0.75
cipl
-0.74
stead
-0.73
peak
-0.73
POSITIVE LOGITS
tarian
0.85
punishment
0.84
Dragonbound
0.83
circumstances
0.82
punishments
0.80
amounts
0.80
sized
0.78
responses
0.78
appropriate
0.76
attire
0.74
Activations Density 0.030%