INDEX
Explanations
mentions of things or entities being the same or equal to each other
phrases indicating equality or similarity
New Auto-Interp
Negative Logits
glers
-0.72
uca
-0.66
ãĤ¡
-0.62
hiba
-0.60
nect
-0.59
inas
-0.58
ashtra
-0.57
ado
-0.56
atche
-0.56
ort
-0.56
POSITIVE LOGITS
ours
0.88
opposed
0.85
those
0.82
theirs
0.82
usual
0.81
existed
0.80
regards
0.77
evidenced
0.74
yours
0.73
they
0.72
Activations Density 0.060%