INDEX
Explanations
adverbs that describe manner or degree
New Auto-Interp
Negative Logits
adapt
-0.58
ècie
-0.57
met
-0.57
わかった
-0.56
поня
-0.56
卉
-0.56
れた
-0.55
れる
-0.55
わからない
-0.54
出る
-0.53
POSITIVE LOGITS
sively
2.05
ously
1.95
ently
1.91
ically
1.86
istically
1.83
ificantly
1.83
tically
1.83
ctively
1.82
antly
1.82
mically
1.81
Activations Density 2.794%