INDEX
Explanations
instances of repetition or return phrases in the text
New Auto-Interp
Negative Logits
Thereafter
-0.46
attles
-0.42
PRS
-0.42
seg
-0.41
brun
-0.41
deshalb
-0.41
initState
-0.41
국
-0.41
culata
-0.41
ropho
-0.41
POSITIVE LOGITS
こちらも
1.20
bootstrapcdn
0.99
again
0.93
Again
0.93
szint
0.93
これも
0.91
Again
0.88
同样
0.88
同樣
0.87
again
0.86
Activations Density 0.478%