INDEX
Explanations
phrases related to controversies or serious accusations
phrases related to references or descriptions of actions and concepts involving "of."
New Auto-Interp
Negative Logits
casting
-0.69
bet
-0.67
preval
-0.63
istg
-0.61
acronym
-0.61
ringe
-0.60
monitor
-0.60
tenance
-0.60
projector
-0.60
flix
-0.59
POSITIVE LOGITS
ãĤ¯
0.66
sorts
0.65
ãĥĺ
0.64
irgin
0.64
course
0.63
minorities
0.62
glers
0.61
innocent
0.61
odan
0.61
certain
0.60
Activations Density 0.194%