INDEX
Explanations
verbs or phrases related to actions or events in a historical or informative context
verbs and phrases related to accusations and assertions
New Auto-Interp
Negative Logits
soType
-0.84
stack
-0.79
*/(
-0.74
Sensor
-0.73
AppData
-0.72
bite
-0.71
Bed
-0.69
Mom
-0.67
thank
-0.67
Drop
-0.66
POSITIVE LOGITS
formatting
0.72
itled
0.69
nudity
0.68
mention
0.67
hov
0.66
humor
0.66
foreigners
0.66
references
0.64
specifically
0.64
homosexuality
0.63
Activations Density 0.520%