INDEX
Explanations
phrases related to explanations or justifications
mentions of the word "reason" in various contexts
New Auto-Interp
Negative Logits
boro
-0.66
agra
-0.66
KY
-0.65
helicop
-0.65
likeness
-0.64
Riders
-0.61
needle
-0.61
oba
-0.60
inis
-0.60
NetMessage
-0.59
POSITIVE LOGITS
abl
1.01
reasons
0.95
neum
0.89
sake
0.80
reason
0.79
mpeg
0.75
resy
0.75
asons
0.74
unrelated
0.72
purposes
0.71
Activations Density 0.028%