INDEX
Explanations
phrases related to public statements or opinions
instances of the word "comments" and related expressions discussing remarks made by individuals
New Auto-Interp
Negative Logits
fruit
-0.73
Bride
-0.69
PG
-0.68
Recon
-0.68
ISH
-0.67
Rescue
-0.67
BLIC
-0.66
ILD
-0.62
riz
-0.62
tom
-0.61
POSITIVE LOGITS
remarks
0.94
uttered
0.92
ariat
0.91
comments
0.91
dispar
0.75
guiActiveUn
0.73
slurs
0.73
briefings
0.72
oras
0.72
regarding
0.72
Activations Density 0.039%