INDEX
Explanations
terms related to intentional actions or behaviors
occurrences of the words "deliberate" and "intentional."
New Auto-Interp
Negative Logits
WB
-0.73
href
-0.70
asta
-0.68
Rated
-0.68
Tycoon
-0.68
Kinnikuman
-0.67
amy
-0.67
Neighbor
-0.67
models
-0.66
ĻĤ
-0.65
POSITIVE LOGITS
deliberate
1.05
intentional
0.90
ãĥĥãĤ¯
0.77
deliber
0.73
theless
0.73
foul
0.71
disson
0.70
attempt
0.67
wrongdoing
0.66
drift
0.65
Activations Density 0.011%