INDEX
Explanations
abstract structures and organization within documents
New Auto-Interp
Negative Logits
greateſt
-1.07
itſelf
-1.03
purpoſe
-1.03
myſelf
-1.01
<<<<<<<<<<<<<<
-0.97
themſelves
-0.95
pleaſure
-0.93
houſe
-0.90
ſche
-0.89
ſever
-0.89
POSITIVE LOGITS
ريكا
0.61
l
0.59
r
0.59
2
0.54
res
0.53
d
0.53
Hu
0.52
0.51
b
0.51
n
0.51
Activations Density 0.600%