INDEX
Explanations
repetitive phrases or similarities in text
the repetition of the word "same."
New Auto-Interp
Negative Logits
Ľ
-0.81
å§
-0.78
irtual
-0.73
HCR
-0.72
ä¿
-0.69
better
-0.67
amar
-0.67
éģ
-0.65
¿
-0.65
Ķ
-0.65
POSITIVE LOGITS
thing
1.20
exact
1.16
amount
0.97
sized
0.95
guy
0.95
wording
0.95
size
0.92
principle
0.92
name
0.92
result
0.92
Activations Density 0.039%