INDEX
Explanations
exact words or phrases that are similar or repeated for emphasis
New Auto-Interp
Negative Logits
ker
-0.94
rift
-0.83
itiz
-0.82
olyn
-0.80
kers
-0.78
jay
-0.78
rug
-0.74
roe
-0.74
asta
-0.74
isson
-0.73
POSITIVE LOGITS
ãĤ¨
0.89
opposite
0.86
aligned
0.83
wrong
0.83
matched
0.75
suited
0.75
positioned
0.75
æ©Ł
0.74
Els
0.74
tuned
0.71
Activations Density 8.570%