INDEX
Explanations
phrases or terms indicating restriction or limitation to a specific group or purpose
phrases that indicate restrictions or limitations on use
New Auto-Interp
Negative Logits
ahime
-0.67
idon
-0.60
insula
-0.59
raught
-0.58
duino
-0.58
Nurs
-0.57
illin
-0.57
PLUS
-0.57
env
-0.56
anon
-0.55
POSITIVE LOGITS
marginally
0.97
ices
0.94
ICES
0.84
spor
0.80
incidentally
0.77
kidding
0.75
insofar
0.74
lasts
0.73
scratches
0.72
onse
0.71
Activations Density 0.059%