INDEX
Explanations
instances of the word "report" with varying strengths of activation
requests and instances of reporting various issues or concerns
New Auto-Interp
Negative Logits
creen
-0.78
erie
-0.69
ierre
-0.62
wich
-0.62
osi
-0.60
gauge
-0.59
frey
-0.59
hement
-0.57
aned
-0.57
ozy
-0.57
POSITIVE LOGITS
sightings
0.77
inacc
0.75
ribe
0.74
ufact
0.73
misconduct
0.72
irregularities
0.71
ounces
0.70
inaccur
0.70
rapes
0.69
eeds
0.68
Activations Density 0.071%