INDEX
Explanations
phrases related to controversial or polarizing topics
topics related to social issues and controversial historical events
New Auto-Interp
Negative Logits
Poké
-0.58
itar
-0.56
explor
-0.56
fray
-0.56
scram
-0.56
zoom
-0.54
map
-0.53
ambient
-0.52
grunt
-0.52
siege
-0.52
POSITIVE LOGITS
instead
0.84
THEN
0.73
thereby
0.70
terday
0.67
hovah
0.65
Instead
0.63
rather
0.63
Instead
0.62
ardless
0.61
Therefore
0.59
Activations Density 1.675%