INDEX
Explanations
references to legal actions and consequences
instances of the word "face" in relation to legal or disciplinary consequences
New Auto-Interp
Negative Logits
ucky
-0.73
ary
-0.71
aneous
-0.68
Paste
-0.68
rust
-0.66
entit
-0.65
rap
-0.62
arp
-0.62
roma
-0.61
rom
-0.60
POSITIVE LOGITS
face
1.01
faces
0.95
face
0.86
Faces
0.83
crow
0.82
Face
0.81
nces
0.79
Face
0.79
metics
0.78
faces
0.76
Activations Density 0.023%